Search CORE

1,514 research outputs found

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Author: Mittal Sparsh
Publication venue
Publication date: 01/01/2014
Field of study

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

arXiv.org e-Print Archive

Crossref

Dynamic Voltage Scaling Techniques for Energy Efficient Synchronized Sensor Network Design

Author: Abdelzaher Tarek F.
Agha Gul A.
Mechitov Kirill A.
Moinzadeh Parya
Shiftehfar Reza
Spencer Billie F., Jr.
Publication venue
Publication date: 02/09/2011
Field of study

Building energy-efficient systems is one of the principal challenges in wireless sensor networks. Dynamic voltage scaling (DVS), a technique to reduce energy consumption by varying the CPU frequency on the fly, has been widely used in other settings to accomplish this goal. In this paper, we show that changing the CPU frequency can affect timekeeping functionality of some sensor platforms. This phenomenon can cause an unacceptable loss of time synchronization in networks that require tight synchrony over extended periods, thus preventing all existing DVS techniques from being applied. We present a method for reducing energy consumption in sensor networks via DVS, while minimizing the impact of CPU frequency switching on time synchronization. The system is implemented and evaluated on a network of 11 Imote2 sensors mounted on a truss bridge and running a high-fidelity continuous structural health monitoring application. Experimental measurements confirm that the algorithm significantly reduces network energy consumption over the same network that does not use DVS, while requiring significantly fewer re-synchronization actions than a classic DVS algorithm.unpublishedis peer reviewe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Trade-off between Energy Savings and Execution Time Applying DVS to a Microprocessor

Author: Alou Cervera Pedro
Cobos Márquez José Antonio
García Suárez Oscar
Oliver Ramírez Jesús Angel
Vasic Miroslav
Publication venue: E.T.S.I. Industriales (UPM)
Publication date: 01/01/2008
Field of study

DVS (Dynamic Voltage Scaling) is a technique used for reducing the power consumption of microprocessors. The power consumed by these circuits has a main component (dynamic power) that is proportional to the square of the supply voltage. Additionally, for every supply voltage, there is a maximum value of the clock frequency. The advantage of using DVS is that the supply voltage (and hence clock frequency) can be adjusted depending on the specific needs during execution. The DVS concept has been used in some commercial products like Transmeta’s Crusoe [1], Intel Speed Step [2], AMD K6 [3], Hitachi SH4 [4], etc. The DVS algorithm proposed in this work is based on the trade-off between the application’s execution time and the energy consumed by the microprocessor. Indirectly, by controlling the execution time the consumed energy is controlled as well. Longer execution time provides less energy demanded by the CPU. The algorithm has been implemented on a platform with an Intel XScale PXA255 microprocessor and the energy saving has been calculated directly measuring currents and voltages on the platform. Using this technique it is possible to achieve up to 50% of power savings, with 50% longer execution time

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Energy aware approach for HPC systems

Author: Basmadjian R.
Cappello F.
Chetsa G. L. T.
Chetsa G. L. T.
Freeh V. W.
Isci C.
Isci C.
Jarus M.
Kimura H.
Meade R. L.
Nagel W. E.
Orgerie A.‐C.
Panas T.
Rivoire S.
Shan H.
Van Der Bijl H. J.
Publication venue: 'Wiley'
Publication date: 18/04/2014
Field of study

International audienceHigh‐performance computing (HPC) systems require energy during their full life cycle from design and production to transportation to usage and recycling/dismanteling. Because of increase of ecological and cost awareness, energy performance is now a primary focus. This chapter focuses on the usage aspect of HPC and how adapted and optimized software solutions could improve energy efficiency. It provides a detailed explanation of server power consumption, and discusses the application of HPC, phase detection, and phase identification. The chapter also suggests that having the load and memory access profiles is insufficient for an effective evaluation of the power consumed by an application. The available leverages in HPC systems are also shown in detail. The chapter proposes some solutions for modeling the power consumption of servers, which allows designing power prediction models for better decision making.These approaches allow the deployment and usage of a set of available green leverages, permitting energy reduction

HAL-ENS-LYON

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Recommended from our members

Ultra-Low Leakage, Energy-Efficient Digital Integrated Circuit and System Design

Author: da Silva Cerqueira Joao Pedro
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

The advances of the complementary metal-oxide-semiconductor (CMOS) technology manufacturing and design over the years have enabled a diverse range of applications across the power consumption, performance, and area (PPA) spectra. Many of the recent and prospective applications rely on the availability of energy-autonomous, miniaturized systems, i.e., ultra-low power (ULP) VLSI systems, which are generally characterized by extreme resource limitations. Some examples of applications are wireless sensing platforms, body-area sensor networks (BASN), biomedical and implantable devices, wearables, hearables, and monitors. Within the context of such applications, the key requirements are long lifetime and miniaturized size (sub-/millimeter-scale). In order to enable both requirements, energy-efficiency is the key metric. It allows for extended battery lifetime and operation with the energy that can be harvested from the environment, and it limits the size (volume) of the energy sources utilized to power these systems. Ultra-low voltage (ULV) operation is a key technique in which the VDD of circuits is reduced from nominal to near or below the threshold voltage of the transistor. It is a powerful knob that has been largely exploited by designers in order to achieve ultra-low power consumption and high energy-efficiency in CMOS. Existing ULP VLSI systems typically operate at a lower supply voltage thereby reducing their energy consumption by one to two orders of magnitude in order to enable the aforementioned applications. While supply voltage scaling is a promising measure for achieving low power and reducing energy consumption, it brings up several challenges. One critical issue is the leakage energy dissipated by the devices, which is magnified in portion to the total energy consumption at ULV. The reason is that, as VDD scales from nominal to near-threshold and sub-threshold, transistors become increasingly slower and they accumulate more leakage (i.e., static) power over longer cycle times. This energy waste accounts for a significant portion of the system's total energy consumption, offsets the gains provided by voltage scaling, defines the minimum energy per operation, and poses a practical limit for the system's energy-efficiency. This thesis presents selected research works on ultra-low leakage, energy-efficient digital integrated circuit design. More specifically, it describes novel and key techniques for minimizing the energy waste of idle/underutilized and always-on hardware. The main goal of such techniques is to push the envelope of energy-efficiency in energy-autonomous, miniaturized VLSI systems. Such techniques are applied to key building blocks of emerging mobile and embedded computing devices resulting in state-of-the-art energy-efficiencies

Columbia University Academic Commons

Performance and power optimizations in chip multiprocessors for throughput-aware computation

Author: Vega Augusto J.
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

The so-called "power (or power density) wall" has caused core frequency (and single-thread performance) to slow down, giving rise to the era of multi-core/multi-thread processors. For example, the IBM POWER4 processor, released in 2001, incorporated two single-thread cores into the same chip. In 2010, IBM released the POWER7 processor with eight 4-thread cores in the same chip, for a total capacity of 32 execution contexts. The ever increasing number of cores and threads gives rise to new opportunities and challenges for software and hardware architects. At software level, applications can benefit from the abundant number of execution contexts to boost throughput. But this challenges programmers to create highly-parallel applications and operating systems capable of scheduling them correctly. At hardware level, the increasing core and thread count puts pressure on the memory interface, because memory bandwidth grows at a slower pace ---phenomenon known as the "bandwidth (or memory) wall". In addition to memory bandwidth issues, chip power consumption rises due to manufacturers' difficulty to lower operating voltages sufficiently every processor generation. This thesis presents innovations to improve bandwidth and power consumption in chip multiprocessors (CMPs) for throughput-aware computation: a bandwidth-optimized last-level cache (LLC), a bandwidth-optimized vector register file, and a power/performance-aware thread placement heuristic. In contrast to state-of-the-art LLC designs, our organization avoids data replication and, hence, does not require keeping data coherent. Instead, the address space is statically distributed all over the LLC (in a fine-grained interleaving fashion). The absence of data replication increases the cache effective capacity, which results in better hit rates and higher bandwidth compared to a coherent LLC. We use double buffering to hide the extra access latency due to the lack of data replication. The proposed vector register file is composed of thousands of registers and organized as an aggregation of banks. We leverage such organization to attach small special-function "local computation elements" (LCEs) to each bank. This approach ---referred to as the "processor-in-regfile" (PIR) strategy--- overcomes the limited number of register file ports. Because each LCE is a SIMD computation element and all of them can proceed concurrently, the PIR strategy constitutes a highly-parallel super-wide-SIMD device (ideal for throughput-aware computation). Finally, we present a heuristic to reduce chip power consumption by dynamically placing software (application) threads across hardware (physical) threads. The heuristic gathers chip-level power and performance information at runtime to infer characteristics of the applications being executed. For example, if an application's threads share data, the heuristic may decide to place them in fewer cores to favor inter-thread data sharing and communication. In such case, the number of active cores decreases, which is a good opportunity to switch off the unused cores to save power. It is increasingly harder to find bulletproof (micro-)architectural solutions for the bandwidth and power scalability limitations in CMPs. Consequently, we think that architects should attack those problems from different flanks simultaneously, with complementary innovations. This thesis contributes with a battery of solutions to alleviate those problems in the context of throughput-aware computation: 1) proposing a bandwidth-optimized LLC; 2) proposing a bandwidth-optimized register file organization; and 3) proposing a simple technique to improve power-performance efficiency.El excesivo consumo de potencia de los procesadores actuales ha desacelerado el incremento en la frecuencia operativa de los mismos para dar lugar a la era de los procesadores con múltiples núcleos y múltiples hilos de ejecución. Por ejemplo, el procesador POWER7 de IBM, lanzado al mercado en 2010, incorpora ocho núcleos en el mismo chip, con cuatro hilos de ejecución por núcleo. Esto da lugar a nuevas oportunidades y desafíos para los arquitectos de software y hardware. A nivel de software, las aplicaciones pueden beneficiarse del abundante número de núcleos e hilos de ejecución para aumentar el rendimiento. Pero esto obliga a los programadores a crear aplicaciones altamente paralelas y sistemas operativos capaces de planificar correctamente la ejecución de las mismas. A nivel de hardware, el creciente número de núcleos e hilos de ejecución ejerce presión sobre la interfaz de memoria, ya que el ancho de banda de memoria crece a un ritmo más lento. Además de los problemas de ancho de banda de memoria, el consumo de energía del chip se eleva debido a la dificultad de los fabricantes para reducir suficientemente los voltajes de operación entre generaciones de procesadores. Esta tesis presenta innovaciones para mejorar el ancho de banda y consumo de energía en procesadores multinúcleo en el ámbito de la computación orientada a rendimiento ("throughput-aware computation"): una memoria caché de último nivel ("last-level cache" o LLC) optimizada para ancho de banda, un banco de registros vectorial optimizado para ancho de banda, y una heurística para planificar la ejecución de aplicaciones paralelas orientada a mejorar la eficiencia del consumo de potencia y desempeño. En contraste con los diseños de LLC de última generación, nuestra organización evita la duplicación de datos y, por tanto, no requiere de técnicas de coherencia. El espacio de direcciones de memoria se distribuye estáticamente en la LLC con un entrelazado de grano fino. La ausencia de replicación de datos aumenta la capacidad efectiva de la memoria caché, lo que se traduce en mejores tasas de acierto y mayor ancho de banda en comparación con una LLC coherente. Utilizamos la técnica de "doble buffering" para ocultar la latencia adicional necesaria para acceder a datos remotos. El banco de registros vectorial propuesto se compone de miles de registros y se organiza como una agregación de bancos. Incorporamos a cada banco una pequeña unidad de cómputo de propósito especial ("local computation element" o LCE). Este enfoque ---que llamamos "computación en banco de registros"--- permite superar el número limitado de puertos en el banco de registros. Debido a que cada LCE es una unidad de cómputo con soporte SIMD ("single instruction, multiple data") y todas ellas pueden proceder de forma concurrente, la estrategia de "computación en banco de registros" constituye un dispositivo SIMD altamente paralelo. Por último, presentamos una heurística para planificar la ejecución de aplicaciones paralelas orientada a reducir el consumo de energía del chip, colocando dinámicamente los hilos de ejecución a nivel de software entre los hilos de ejecución a nivel de hardware. La heurística obtiene, en tiempo de ejecución, información de consumo de potencia y desempeño del chip para inferir las características de las aplicaciones. Por ejemplo, si los hilos de ejecución a nivel de software comparten datos significativamente, la heurística puede decidir colocarlos en un menor número de núcleos para favorecer el intercambio de datos entre ellos. En tal caso, los núcleos no utilizados se pueden apagar para ahorrar energía. Cada vez es más difícil encontrar soluciones de arquitectura "a prueba de balas" para resolver las limitaciones de escalabilidad de los procesadores actuales. En consecuencia, creemos que los arquitectos deben atacar dichos problemas desde diferentes flancos simultáneamente, con innovaciones complementarias

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Coordinating Resource Use in Open Distributed Systems

Author: Zhao Xinghui
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

In an open distributed system, computational resources are peer-owned, and distributed over time and space. The system is open to interactions with its environment, and the resources can dynamically join or leave the system, or can be discovered at runtime. This dynamicity leads to opportunities to carry out computations without statically owned resources, harnessing the collective compute power of the resources connected by the Internet. However, realizing this potential requires efficient and scalable resource discovery, coordination, and control, which present challenges in a dynamic, open environment. In this thesis, I present an approach to address these challenges by separating the functionality concerns of concurrent computations from those of coordinating their resource use, with the purpose of reducing programming complexity, and aiding development of correct, efficient, and resource-aware concurrent programs. As a first step towards effectively coordinating distributed resources, I developed DREAM, a Distributed Resource Estimation and Allocation Model, which enables computations to reason about future availability of resources. I then developed a fine-grained resource coordination scheme for distributed computations. The coordination scheme integrates DREAM-based resource reasoning into a distributed scheduler, for deciding and enforcing fine-grained resource-use schedules for distributed computations. To control the overhead caused by the coordination, a tuner is implemented which explicitly balances the overhead of the control mechanisms against the extent of control exercised. The effectiveness and performance of the resource coordination approach have been evaluated using a number of case studies. Experimental results show that the approach can effectively schedule computations for supporting various types of coordination objectives, such as ensuring Quality-of-Service, power-efficient execution, and dynamic load balancing. The overhead caused by the coordination mechanism is relatively modest, and adjustable through the tuner. In addition, the coordination mechanism does not add extra programming complexity to computations

eCommons@USASK

University of Saskatchewan Research Archive

A Workload-Aware, Eco-Friendly Daemon for Cluster Computing

Author: Feng W.
Huang S.
Publication venue
Publication date: 01/01/2008
Field of study

This paper presents an eco-friendly daemon that reduces power consumption while better maintaining high performance via a novel behavioral quantification of workload. Specifically, our behavioral quantification achieves a more accurate workload characterization than previous approaches by inferring "processor stall cycles due to off-chip activities." This quantification, in turn, provides a foundation upon which we construct an interval-based, power-aware, run-time algorithm that is implemented within a system-wide daemon. We then evaluate our power-aware daemon in a cluster-computing environment with the NAS Parallel Benchmarks. The results indicate that our novel behavioral quantification of workload allows our power-aware daemon to more tightly control performance while delivering substantial energy savings

Computer Science Technical Reports @Virginia Tech

CiteSeerX

Design of complex integrated systems based on networks-on-chip: Trading off performance, power and reliability

Author: Cornelius Claas (gnd: 1025615611)
Publication venue: Universität Rostock Rostock
Publication date
Field of study

The steady advancement of microelectronics is associated with an escalating number of challenges for design engineers due to both the tiny dimensions and the enormous complexity of integrated systems. Against this background, this work deals with Network-On-Chip (NOC) as the emerging design paradigm to cope with diverse issues of nanotechnology. The detailed investigations within the chapters focus on the communication-centric aspects of multi-core-systems, whereas performance, power consumption as well as reliability are considered likewise as the essential design criteria

Rostocker Dokumentenserver

EClass: An execution classification approach to improving the energy-efficiency of software via machine learning

Author: Chan WK
Kan EYY
Tse TH
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Energy efficiency at the software level has gained much attention in the past decade. This paper presents a performance-aware frequency assignment algorithm for reducing processor energy consumption using Dynamic Voltage and Frequency Scaling (DVFS). Existing energy-saving techniques often rely on simplified predictions or domain knowledge to extract energy savings for specialized software (such as multimedia or mobile applications) or hardware (such as NPU or sensor nodes). We present an innovative framework, known as EClass, for general-purpose DVFS processors by recognizing short and repetitive utilization patterns efficiently using machine learning. Our algorithm is lightweight and can save up to 52.9% of the energy consumption compared with the classical PAST algorithm. It achieves an average savings of 9.1% when compared with an existing online learning algorithm that also utilizes the statistics from the current execution only. We have simulated the algorithms on a cycle-accurate power simulator. Experimental results show that EClass can effectively save energy for real life applications that exhibit mixed CPU utilization patterns during executions. Our research challenges an assumption among previous work in the research community that a simple and efficient heuristic should be used to adjust the processor frequency online. Our empirical result shows that the use of an advanced algorithm such as machine learning can not only compensate for the energy needed to run such an algorithm, but also outperforms prior techniques based on the above assumption. © 2011 Elsevier Inc. All rights reserved.postprin

HKU Scholars Hub