Search CORE

190,351 research outputs found

Instruction prefetching techniques for ultra low-power multicore architectures

Author: Payami Maryam
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 20/12/2016
Field of study

As the gap between processor and memory speeds increases, memory latencies have become a critical bottleneck for computing performance. To reduce this bottleneck, designers have been working on techniques to hide these latencies. On the other hand, design of embedded processors typically targets low cost and low power consumption. Therefore, techniques which can satisfy these constraints are more desirable for embedded domains. While out-of-order execution, aggressive speculation, and complex branch prediction algorithms can help hide the memory access latency in high-performance systems, yet they can cost a heavy power budget and are not suitable for embedded systems. Prefetching is another popular method for hiding the memory access latency, and has been studied very well for high-performance processors. Similarly, for embedded processors with strict power requirements, the application of complex prefetching techniques is greatly limited, and therefore, a low power/energy solution is mostly desired in this context. In this work, we focus on instruction prefetching for ultra-low power processing architectures and aim to reduce energy overhead of this operation by proposing a combination of simple, low-cost, and energy efficient prefetching techniques. We study a wide range of applications from cryptography to computer vision and show that our proposed mechanisms can effectively improve the hit-rate of almost all of them to above 95%, achieving an average performance improvement of more than 2X. Plus, by synthesizing our designs using the state-of-the-art technologies we show that the prefetchers increase system’s power consumption less than 15% and total silicon area by less than 1%. Altogether, a total energy reduction of 1.9X is achieved, thanks to the proposed schemes, enabling a significantly higher battery life

AMS Tesi di Laurea

Dynamic Binary Translation for Embedded Systems with Scratchpad Memory

Author: Baiocchi Paredes Jose Americo
Publication venue
Publication date: 01/01/2011
Field of study

Embedded software development has recently changed with advances in computing. Rather than fully co-designing software and hardware to perform a relatively simple task, nowadays embedded and mobile devices are designed as a platform where multiple applications can be run, new applications can be added, and existing applications can be updated. In this scenario, traditional constraints in embedded systems design (i.e., performance, memory and energy consumption and real-time guarantees) are more difficult to address. New concerns (e.g., security) have become important and increase software complexity as well. In general-purpose systems, Dynamic Binary Translation (DBT) has been used to address these issues with services such as Just-In-Time (JIT) compilation, dynamic optimization, virtualization, power management and code security. In embedded systems, however, DBT is not usually employed due to performance, memory and power overhead. This dissertation presents StrataX, a low-overhead DBT framework for embedded systems. StrataX addresses the challenges faced by DBT in embedded systems using novel techniques. To reduce DBT overhead, StrataX loads code from NAND-Flash storage and translates it into a Scratchpad Memory (SPM), a software-managed on-chip SRAM with limited capacity. SPM has similar access latency as a hardware cache, but consumes less power and chip area. StrataX manages SPM as a software instruction cache, and employs victim compression and pinning to reduce retranslation cost and capture frequently executed code in the SPM. To prevent performance loss due to excessive code expansion, StrataX minimizes the amount of code inserted by DBT to maintain control of program execution. When a hardware instruction cache is available, StrataX dynamically partitions translated code among the SPM and main memory. With these techniques, StrataX has low performance overhead relative to native execution for MiBench programs. Further, it simplifies embedded software and hardware design by operating transparently to applications without any special hardware support. StrataX achieves sufficiently low overhead to make it feasible to use DBT in embedded systems to address important design goals and requirements

CiteSeerX

D-Scholarship@Pitt

Architecture extensions for efficient managament of scratch-pad Memory

Author: Busquets Mataix José Vicente
Catalá Carlos
Martí Campoy Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Nowadays, many embedded processors include in their architecture on-chip static memories, so called scratch-pad memories (SPM). Compared to cache, these memories do not require complex control logic, thus resulting in increased efficiency both in silicon area and energy consumption. Last years, many papers have proposed algorithms to allocate memory segments in SPM in order to enhance its usage. However, very few care about the SPM architecture itself, to make it more controllable, more power efficient and faster. This paper proposes architecture extensions to automatically load code into the SPM whilst it is fetched for execution to reduce the SPM updating delays, which motivates a very dynamic use of the SPM. We test our proposal in a derivation of the Simplescalar simulator, with typical embedded benchmarks. The results show improvements, on average, of 30.6% in energy saving and 7.6% in performance compared to a system with cache. © 2011 Springer-Verlag.This research was sponsored by local Government “Generalitat Valenciana” under project GV07/ 2007/122.Busquets Mataix, JV.; Catalá, C.; Martí Campoy, A. (2011). Architecture extensions for efficient managament of scratch-pad Memory. En Integrated Circuit and System Design. Power and Timing Modeling, Optimization, and Simulation. Springer Verlag (Germany). (6951):43-52. https://doi.org/10.1007/978-3-642-24154-3_5S43526951Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., Marwedel, P.: Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In: CODES 2002, pp. 73–78 (2002)Verma, M., Wehmeyer, L., Marwedel, P.: Cache-Aware Scratchpad Allocation Algorithm. In: DATE 2004, pp. 1264–1269 (2004)Verma, M., Marwedel, P.: Advanced memory optimization techniques for low-power embedded processors, pp. I-XII, 1–188. Springer, Heidelberg (2007)Nguyen, N., Dominguez, A., Barua, R.: Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In: CASES 2005, pp. 115–125 (2005)Egger, B., Kim, C., Jang, C., Nam, Y., Lee, J., Min, S.L.: A dynamic code placement technique for scratchpad memory using postpass optimization. In: CASES 2006, pp. 223–233 (2006)Egger, B., Lee, J., Shin, H.: Scratchpad memory management for portable systems with a memory management unit. In: EMSOFT 2006, pp. 321–330 (2006)Egger, B., Lee, J., Shin, H.: Dynamic scratchpad memory management for code in portable systems with an MMU. ACM Trans. Embedded Comput. Syst. 7(2) (2008)Cho, H., Egger, B., Lee, J., Shin, H.: Dynamic data scratchpad memory management for a memory subsystem with an MMU. In: LCTES 2007, pp. 195–206 (2007)Janapsatya, A., Parameswaran, S., Ignjatovic, A.: Hardware/software managed scratchpad memory for embedded system. In: ICCAD 2004, pp. 370–377 (2004)Balakrishnan, M., Marwedel, P., Wehmeyer, L., Grunwald, N., Banakar, R., Steinke, S.: Reducing Energy Consumption by Dynamic Copying of Instructions onto Onchip Memory. In: ISSS 2002, pp. 213–218 (2002)Poletti, F., Marchal, P., Atienza, D., Benini, L., Catthoor, F., Mendias, J.M.: An integrated hardware/software approach for run-time scratchpad management. In: DAC 2004, pp. 238–243 (2004)Li, L., Gao, L., Xue, J.: Memory Coloring: A Compiler Approach for Scratchpad Memory Management. In: IEEE PACT 2005, pp. 329–338 (2005)Lee, L.H., Moyer, B., Arends, J.: Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In: ISLPED 1999, pp. 267–269 (1999)Victorio, J.A., Torres Moren, E.F., Yúfera, V.V.: Vatios: Simulador de Procesador con Estimación de Potencia. XVIII Jornadas de Paralelismo, Zaragoza (2007)Burger, D., Austin, T.M.: The SimpleScalar Tool Set Version 2.0. Technical Report 1342, Computer Sciences Department. University of Wisconsin–Madison (May 1997)Brooks, D., Tiwari, V., Martonosi, M.: Wattch: a framework for architectural-level power analysis and optimizations. In: ISCA 2000, pp. 83–94 (2000)Tarjan, D., Thoziyoor, S., Jouppi, N.: CACTI 4.0, P. HPL-2006- 86 20060606The Mälardalen WCET research group. The Mälardalen WCET benchmarks homepage, http://www.mrtc.mdh.se/projects/wcet/benchmarks.htmlCho, D., Pasricha, S., Issenin, I., Dutt, N.D., Ahn, M., Paek, Y.: Adaptive Scratch Pad Memory Management for Dynamic Behavior of Multimedia Applications. IEEE Trans. on CAD of Integrated Circuits and Systems (TCAD) 28(4), 554–567 (2009

Crossref

RiuNet

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Author: Mittal Sparsh
Publication venue
Publication date: 01/01/2014
Field of study

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

arXiv.org e-Print Archive

Crossref

Doctor of Philosophy

Author: Redd Bennion
Publication venue: University of Utah
Publication date: 01/01/2015
Field of study

dissertationAdvancements in process technology and circuit techniques have enabled the creation of small chemical microsystems for use in a wide variety of biomedical and sensing applications. For applications requiring a small microsystem, many components can be integrated onto a single chip. This dissertation presents many low-power circuits, digital and analog, integrated onto a single chip called the Utah Microcontroller. To guide the design decisions for each of these components, two specific microsystems have been selected as target applications: a Smart Intravaginal Ring (S-IVR) and an NO releasing catheter. Both of these applications share the challenging requirements of integrating a large variety of low-power mixed-signal circuitry onto a single chip. These applications represent the requirements of a broad variety of small low-power sensing systems. In the course of the development of the Utah Microcontroller, several unique and significant contributions were made. A central component of the Utah Microcontroller is the WIMS Microprocessor, which incorporates a low-power feature called a scratchpad memory. For the first time, an analysis of scaling trends projected that scratchpad memories will continue to save power for the foreseeable future. This conclusion was bolstered by measured data from a fabricated microcontroller. In a 32 nm version of the WIMS Microprocessor, the scratchpad memory is projected to save ~10-30% of memory access energy depending upon the characteristics of the embedded program. Close examination of application requirements informed the design of an analog-to-digital converter, and a unique single-opamp buffered charge scaling DAC was developed to minimize power consumption. The opamp was designed to simultaneously meet the varied demands of many chip components to maximize circuit reuse. Each of these components are functional, have been integrated, fabricated, and tested. This dissertation successfully demonstrates that the needs of emerging small low-power microsystems can be met in advanced process nodes with the incorporation of low-power circuit techniques and design choices driven by application requirements

The University of Utah: J. Willard Marriott Digital Library

Energy Saving Techniques for Phase Change Memory (PCM)

Author: Mittal Sparsh
Publication venue
Publication date: 15/09/2013
Field of study

In recent years, the energy consumption of computing systems has increased and a large fraction of this energy is consumed in main memory. Towards this, researchers have proposed use of non-volatile memory, such as phase change memory (PCM), which has low read latency and power; and nearly zero leakage power. However, the write latency and power of PCM are very high and this, along with limited write endurance of PCM present significant challenges in enabling wide-spread adoption of PCM. To address this, several architecture-level techniques have been proposed. In this report, we review several techniques to manage power consumption of PCM. We also classify these techniques based on their characteristics to provide insights into them. The aim of this work is encourage researchers to propose even better techniques for improving energy efficiency of PCM based main memory.Comment: Survey, phase change RAM (PCRAM

arXiv.org e-Print Archive

CiteSeerX