Search CORE

27 research outputs found

DTAPO: Dynamic thermal-aware performance optimization for dark silicon many-core systems

Author: Ab. Rahman Ab. Al Hadi
Al Kubati Ali A. M.
Marsono M. N.
Mohammed Mohammed Sultan
Paraman Norlina
Publication venue: 'MDPI AG'
Publication date: 01/11/2020
Field of study

Future many-core systems need to handle high power density and chip temperature effectively. Some cores in many-core systems need to be turned off or ‘dark’ to manage chip power and thermal density. This phenomenon is also known as the dark silicon problem. This problem prevents many-core systems from utilizing and gaining improved performance from a large number of processing cores. This paper presents a dynamic thermal-aware performance optimization of dark silicon many-core systems (DTaPO) technique for optimizing dark silicon a many-core system performance under temperature constraint. The proposed technique utilizes both task migration and dynamic voltage frequency scaling (DVFS) for optimizing the performance of a many-core system while keeping system temperature in a safe operating limit. Task migration puts hot cores in low-power states and moves tasks to cooler dark cores to aggressively reduce chip temperature while maintaining high overall system performance. To reduce task migration overhead due to cold start, the source core (i.e., active core) keeps its L2 cache content during the initial migration phase. The destination core (i.e., dark core) can access it to reduce the impact of cold start misses. Moreover, the proposed technique limits tasks migration among cores that share the last level cache (LLC). In the case of major thermal violation and no cooler cores being available, DVFS is used to reduce the hot cores temperature gradually by reducing their frequency. Experimental results for different threshold temperatures show that DTaPO can keep the average system temperature below the thermal limit. Affirmatively, the execution time penalty is reduced by up to 18% compared with using only DVFS for all thermal thresholds. Moreover, the average peak temperature is reduced by up to 10.8◦ C. In addition, the experimental results show that DTaPO improves the system’s performance by up to 80% compared to optimal sprinting patterns (OSP) and reduces the temperature by up to 13.6◦ C

Universiti Teknologi Malaysia Institutional Repository

New Logic-In-Memory Paradigms: An Architectural and Technological Perspective

Author: Graziano Mariagrazia
Santoro Giulia
Turvani Giovanna
Publication venue: 'MDPI AG'
Publication date: 31/05/2019
Field of study

Processing systems are in continuous evolution thanks to the constant technological advancement and architectural progress. Over the years, computing systems have become more and more powerful, providing support for applications, such as Machine Learning, that require high computational power. However, the growing complexity of modern computing units and applications has had a strong impact on power consumption. In addition, the memory plays a key role on the overall power consumption of the system, especially when considering data-intensive applications. These applications, in fact, require a lot of data movement between the memory and the computing unit. The consequence is twofold: Memory accesses are expensive in terms of energy and a lot of time is wasted in accessing the memory, rather than processing, because of the performance gap that exists between memories and processing units. This gap is known as the memory wall or the von Neumann bottleneck and is due to the different rate of progress between complementary metal-oxide semiconductor (CMOS) technology and memories. However, CMOS scaling is also reaching a limit where it would not be possible to make further progress. This work addresses all these problems from an architectural and technological point of view by: (1) Proposing a novel Configurable Logic-in-Memory Architecture that exploits the in-memory computing paradigm to reduce the memory wall problem while also providing high performance thanks to its flexibility and parallelism; (2) exploring a non-CMOS technology as possible candidate technology for the Logic-in-Memory paradigm

Multidisciplinary Digital Publishing Institute

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Energy harvesting towards self-powered iot devices

Author: Atek S.
Elahi H.
Eugeni M.
Gaudenzi P.
Munir K.
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

The internet of things (IoT) manages a large infrastructure of web-enabled smart devices, small devices that use embedded systems, such as processors, sensors, and communication hardware to collect, send, and elaborate on data acquired from their environment. Thus, from a practical point of view, such devices are composed of power-efficient storage, scalable, and lightweight nodes needing power and batteries to operate. From the above reason, it appears clear that energy harvesting plays an important role in increasing the efficiency and lifetime of IoT devices. Moreover, from acquiring energy by the surrounding operational environment, energy harvesting is important to make the IoT device network more sustainable from the environmental point of view. Different state-of-the-art energy harvesters based on mechanical, aeroelastic, wind, solar, radiofrequency, and pyroelectric mechanisms are discussed in this review article. To reduce the power consumption of the batteries, a vital role is played by power management integrated circuits (PMICs), which help to enhance the system's life span. Moreover, PMICs from different manufacturers that provide power management to IoT devices have been discussed in this paper. Furthermore, the energy harvesting networks can expose themselves to prominent security issues putting the secrecy of the system to risk. These possible attacks are also discussed in this review article

Archivio della ricerca- Università di Roma La Sapienza

Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

Author: Younes Hamoud
Publication venue: Universit\ue0 degli studi di Genova
Publication date: 24/02/2022
Field of study

Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%

Archivio istituzionale della ricerca - Università di Genova

Harvesting-aware energy management for environmental monitoring WSN

Author: Musilek Petr
Rodway James
Publication venue: 'MDPI AG'
Publication date: 01/01/2017
Field of study

Wireless sensor networks can be used to collect data in remote locations, especially when energy harvesting is used to extend the lifetime of individual nodes. However, in order to use the collected energy most effectively, its consumption must be managed. In this work, forecasts of diurnal solar energies were made based on measurements of atmospheric pressure. These forecasts were used as part of an adaptive duty cycling scheme for node level energy management. This management was realized with a fuzzy logic controller that has been tuned using differential evolution. Controllers were created using one and two days of energy forecasts, then simulated in software. These controllers outperformed a human-created reference controller by taking more measurements while using less reserve energy during the simulated period. The energy forecasts were comparable to other available methods, while the method of tuning the fuzzy controller improved overall node performance. The combination of the two is a promising method of energy management.Web of Science105art. no. 60

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

DSpace at VSB Technical University of Ostrava

Time- and Amplitude-Controlled Power Noise Generator against SPA Attacks for FPGA-Based IoT Devices

Author: Castillo Morales María Encarnación
García Ríos Antonio
López Villanueva Juan Antonio
Parrilla Roure Luis
Rodríguez Bolívar Salvador
Publication venue: 'MDPI AG'
Publication date: 10/09/2022
Field of study

Power noise generation for masking power traces is a powerful countermeasure against Simple Power Analysis (SPA), and it has also been used against Differential Power Analysis (DPA) or Correlation Power Analysis (CPA) in the case of cryptographic circuits. This technique makes use of power consumption generators as basic modules, which are usually based on ring oscillators when implemented on FPGAs. These modules can be used to generate power noise and to also extract digital signatures through the power side channel for Intellectual Property (IP) protection purposes. In this paper, a new power consumption generator, named Xored High Consuming Module (XHCM), is proposed. XHCM improves, when compared to others proposals in the literature, the amount of current consumption per LUT when implemented on FPGAs. Experimental results show that these modules can achieve current increments in the range from 2.4 mA (with only 16 LUTs on Artix-7 devices with a power consumption density of 0.75 mW/LUT when using a single HCM) to 11.1 mA (with 67 LUTs when using 8 XHCMs, with a power consumption density of 0.83 mW/LUT). Moreover, a version controlled by Pulse-Width Modulation (PWM) has been developed, named PWM-XHCM, which is, as XHCM, suitable for power watermarking. In order to build countermeasures against SPA attacks, a multi-level XHCM (ML-XHCM) is also presented, which is capable of generating different power consumption levels with minimal area overhead (27 six-input LUTS for generating 16 different amplitude levels on Artix-7 devices). Finally, a randomized version, named RML-XHCM, has also been developed using two True Random Number Generators (TRNGs) to generate current consumption peaks with random amplitudes at random times. RML-XHCM requires less than 150 LUTs on Artix-7 devices. Taking into account these characteristics, two main contributions have been carried out in this article: first, XHCM and PWM-XHCM provide an efficient power consumption generator for extracting digital signatures through the power side channel, and on the other hand, ML-XHCM and RML-XHCM are powerful tools for the protection of processing units against SPA attacks in IoT devices implemented on FPGAs.Junta de AndaluciaEuropean Commission B-TIC-588-UGR2

Repositorio Institucional Universidad de Granada

RISC-Vlim, a RISC-V Framework for Logic-in-Memory Architectures

Author: Andrea Coluccio
Antonia Ieva
Fabrizio Riente
Marco Ottavi
Marco Vacca
Massimo Ruo Roch
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Most modern CPU architectures are based on the von Neumann principle, where memory and processing units are separate entities. Although processing unit performance has improved over the years, memory capacity has not followed the same trend, creating a performance gap between them. This problem is known as the "memory wall" and severely limits the performance of a microprocessor. One of the most promising solutions is the "logic-in-memory" approach. It consists of merging memory and logic units, enabling data to be processed directly inside the memory itself. Here we propose an RISC-V framework that supports logic-in-memory operations. We substitute data memory with a circuit capable of storing data and of performing in-memory computation. The framework is based on a standard memory interface, so different logic-in-memory architectures can be inserted inside the microprocessor, based both on CMOS and emerging technologies. The main advantage of this framework is the possibility of comparing the performance of different logic-in-memory solutions on code execution. We demonstrate the effectiveness of the framework using a CMOS volatile memory and a memory based on a new emerging technology, racetrack logic. The results demonstrate an improvement in algorithm execution speed and a reduction in energy consumption

Directory of Open Access Journals

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

RISC-Vlim, a RISC-V Framework for Logic-in-Memory Architectures

Author: Coluccio A.
Ieva A.
Ottavi M.
Riente F.
Roch M. R.
Vacca M.
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

ART

A Survey of Fault-Tolerance Techniques for Embedded Systems from the Perspective of Power, Energy, and Thermal Issues

Author: Ansari M.
Ejlali A.
Henkel J.
Hessabi S.
Khdr H.
Nazari P. G.
Safari S.
Yari-Karin S.
Yeganeh-Khaksar A.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 02/02/2022
Field of study

The relentless technology scaling has provided a significant increase in processor performance, but on the other hand, it has led to adverse impacts on system reliability. In particular, technology scaling increases the processor susceptibility to radiation-induced transient faults. Moreover, technology scaling with the discontinuation of Dennard scaling increases the power densities, thereby temperatures, on the chip. High temperature, in turn, accelerates transistor aging mechanisms, which may ultimately lead to permanent faults on the chip. To assure a reliable system operation, despite these potential reliability concerns, fault-tolerance techniques have emerged. Specifically, fault-tolerance techniques employ some kind of redundancies to satisfy specific reliability requirements. However, the integration of fault-tolerance techniques into real-time embedded systems complicates preserving timing constraints. As a remedy, many task mapping/scheduling policies have been proposed to consider the integration of fault-tolerance techniques and enforce both timing and reliability guarantees for real-time embedded systems. More advanced techniques aim additionally at minimizing power and energy while at the same time satisfying timing and reliability constraints. Recently, some scheduling techniques have started to tackle a new challenge, which is the temperature increase induced by employing fault-tolerance techniques. These emerging techniques aim at satisfying temperature constraints besides timing and reliability constraints. This paper provides an in-depth survey of the emerging research efforts that exploit fault-tolerance techniques while considering timing, power/energy, and temperature from the real-time embedded systems’ design perspective. In particular, the task mapping/scheduling policies for fault-tolerance real-time embedded systems are reviewed and classified according to their considered goals and constraints. Moreover, the employed fault-tolerance techniques, application models, and hardware models are considered as additional dimensions of the presented classification. Lastly, this survey gives deep insights into the main achievements and shortcomings of the existing approaches and highlights the most promising ones

KITopen

Slowing down for performance and energy: an OS-centric study in network driven workloads

Author: Appavoo Jonathan
Arora Sanjay
Awad Yara
Dong Han
Krieger Orran
Unger Tommy
Publication venue
Publication date: 13/12/2021
Field of study

This paper studies three fundamental aspects of an OS that impact the performance and energy efficiency of network processing: 1) batching, 2) processor energy settings, and 3) the logic and instructions of the OS networking paths. A network device’s interrupt delay feature is used to induce batching and processor frequency is manipulated to control the speed of instruction execution. A baremetal library OS is used to explore OS path specialization. This study shows how careful use of batching and interrupt delay results in 2X energy and performance improvements across different workloads. Surprisingly, we find polling can be made energy efficient and can result in gains up to 11X over baseline Linux. We developed a methodology and a set of tools to collect system data in order to understand how energy is impacted at a fine-grained granularity. This paper identifies a number of other novel findings that have implications in OS design for networked applications and suggests a path forward to consider energy as a focal point of systems research.First author draf

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)