Search CORE

1,945 research outputs found

Voltage Set-up Problem on Embedded Systems with Multiple Voltages

Author: Hua Shaoxiong
Qu Gang
Publication venue
Publication date: 10/11/2005
Field of study

Dynamic voltage scaling (DVS), arguably the most effective energy reduction technique, can be enabled by having multiple voltages physically implemented on the chip and allowing the operating system to decide which voltage to use at run-time. Indeed, this is predicted as the future low-power system by International Technology Roadmap for Semiconductors (ITRS). There still exist many important unsolved problems on how to reduce the system's dynamic and/or total power by DVS. One of such problems, which we refer to as the voltage set-up problem, is "how many levels and at which values should voltages be implemented for the system to achieve the maximum energy saving". It challenges whether DVS technique's full potential in energy saving can be reached on multiple-voltage systems. In this paper, (1) we derive analytical solutions for dual-voltage system. (2) For the general case that does not have analytic solutions, we develop efficient numerical methods that can take the overhead of voltage switch and leakage into account. (3) We demonstrate how to apply the proposed algorithms on system design. (4) Interestingly, the experimental results, on both real life DSP applications and random created applications, suggest that multiple-voltage DVS systems with only a couple levels of voltages, when set up properly, can be very close to DVS technique's full potential in energy saving. Parts of this report were published in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 13, No. 7, pp. 869-872, July 2005

Digital Repository at the University of Maryland

DVFS using clock scheduling for Multicore Systems-on-Chip and Networks-on-Chip

Author: Yadav MANOJ KUMAR
Publication venue: Politecnico di Torino
Publication date
Field of study

A modern System-on-Chip (SoC) contains processor cores, application-specific process- ing elements, memory, peripherals, all connected with a high-bandwidth and low-latency Network-on-Chip (NoC). The downside of such very high level of integration and con- nectivity is the high power consumption. In CMOS technology this is made of a dynamic and a static component. To reduce the dynamic component, Dynamic voltage and Fre- quency Scaling (DVFS) has been adopted. Although DVFS is very effective chip-wide, the power optimization of complex SoCs calls for a finer grain application of DVFS. Ideally all the main components of an SoC should be provided with a DVFS controller. An SoC with a DVFS controller per component with individual DC-DC converters and PLL/DLL circuits cannot scale in size to hundreds of components, which are in the research agenda. We present an alternative that will permit such scaling. It is possible to achieve results close to an optimum DVFS by hopping between few voltage levels and by an innovative application of clock-gating that we term as clock scheduling. We obtain an effective clock frequency by periodically killing some clock cycles of a master clock. We can apply voltage scaling for some of the periodic clock schedules which yield effective clock 1/2, 1/3, . . . By dithering between few voltages we obtain results close to an ideal DVFS system in simple pipelined circuits and in a complex example, a NoC’s switch. Again in the context of a NoC, we show how clock scheduling and voltage scaling can be automatically determined by means of a proportional-integral loop controller that keeps track of the network load. We describe in detail its implementation and all the circuit-level issues that we found. For a single switch, result shows an advantage of up to 2X over simple frequency scaling without voltage scaling. By providing each NoC’s switch with our simple DVFS controller, power saving at network level can be significantly more than what a a global DVFS controller can get. In a realistic scenario represented by network traces generated by video applications (MPEG, PIP, MWD, VoPD), we obtain an average power saving of 33%. To reduce static power, the Power-Gating (PG) technique is used and consists in switching- off power supply of unused blocks via pMOS headers or nMOS footers in series with such blocks. Even though research has been done in this field, the application of PG to NoCs has not been fully investigated. We show that it is possible to apply PG to the input buffers of a NoC switch. Their leakage power contributes about 40-50% of total NoC power, hence reducing such contribution is worthwhile. We partitioned buffers in banks and apply PG only to inactive banks. With our technique, it is possible to save about 40% in leakage power, without impact on performance

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Design of multimedia processor based on metric computation

Author: Balasa
Berekovic
Jean Luc Philippe
Jean Philippe Diguet
Mohamed Abid
Nader Ben Amor
Suzuki
Wuytack
Yannick Le Moullec
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

VBN

Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors

Author: DeHon André
Kapre Nachiket
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE model-evaluation. Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in the circuit. Our compiler uses architecture specific parallelization strategies (OpenMP for multi-core, PThreads for Cell, CUDA for GPU, statically scheduled VLIW for FPGA) when producing code for these different architectures. We automatically explore different implementation configurations (e.g. unroll factor, vector length) using our performance-tuner to identify the best possible configuration for each architecture. We demonstrate speedups of 3- 182times for a Xilinx Virtex5 LX 330T, 1.3-33times for an IBM Cell, and 3-131times for an NVIDIA 9600 GT GPU over a 3 GHz Intel Xeon 5160 implementation for a variety of single-precision device models

Crossref

Caltech Authors

DR-NTU (Digital Repository of NTU)

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Author: Mittal Sparsh
Publication venue
Publication date: 01/01/2014
Field of study

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

arXiv.org e-Print Archive

Crossref

Multiple voltage scheme with frequency variation for power minimization of pipelined circuits at high-level synthesis

Author: Radhakrishnan Bharath
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2003
Field of study

High-Level Synthesis (HLS) is defined as a translation process from a behavioral description into structural description. The high-level synthesis process consists of three interdependent phases: scheduling, allocation and binDing The order of the three phases varies depending on the design flow. There are three important quality measures used to support design decision, namely size, performance and power consumption. Recently, with the increase in portability, the power consumption has become a very dominant factor in the design of circuits. The aim of low-power high-level synthesis is to schedule operations to minimize switching activity and select low power modules while satisfying timing constraints. This thesis presents a heuristic that helps minimize power consumption by operating the functional units at multiple voltages and varied clock frequencies. The algorithm presented here deals with pipelined operations where multiple instance of the same operation are carried out. The algorithm was implemented using C++, on LINUX platform

University of Nevada, Las Vegas Repository