Search CORE

9 research outputs found

Power and Reliability Management of SoCs

Author: De Micheli Giovanni
Mihic Kresimir
Simunic Rosing Tajana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Today's embedded systems integrate multiple IP cores for processing, communication, and sensing on a single die as systems-on-chip (SoCs). Aggressive transistor scaling, decreased voltage margins and increased processor power and temperature have made reliability assessment a much more significant issue. Although reliability of devices and interconnect has been broadly studied, in this work, we study a tradeoff between reliability and power consumption for component-based SoC designs. We specifically focus on hard error rates as they cause a device to permanently stop operating. We also present a joint reliability and power management optimization problem whose solution is an optimal management policy. When careful joint policy optimization is performed, we obtain a significant improvement in energy consumption (40%) in tandem with meeting a reliability constraint for all SoC operating temperatures

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Fault tolerance at system level based on RADIC architecture

Author: Castro León Marcela
Luque Emilio
Meyer Hugo Daniel
Rexachs del Rosario Dolores Isabel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

The increasing failure rate in High Performance Computing encourages the investigation of fault tolerance mechanisms to guarantee the execution of an application in spite of node faults. This paper presents an automatic and scalable fault tolerant model designed to be transparent for applications and for message passing libraries. The model consists of detecting failures in the communication socket caused by a faulty node. In those cases, the affected processes are recovered in a healthy node and the connections are reestablished without losing data. The Redundant Array of Distributed Independent Controllers architecture proposes a decentralized model for all the tasks required in a fault tolerance system: protection, detection, recovery and masking. Decentralized algorithms allow the application to scale, which is a key property for current HPC system. Three different rollback recovery protocols are defined and discussed with the aim of offering alternatives to reduce overhead when multicore systems are used. A prototype has been implemented to carry out an exhaustive experimental evaluation through Master/Worker and Single Program Multiple Data execution models. Multiple workloads and an increasing number of processes have been taken into account to compare the above mentioned protocols. The executions take place in two multicore Linux clusters with different socket communications libraries

Elsevier - Publisher Connector

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Diposit Digital de Documents de la UAB

Reliability-Aware Design for Nanometer-Scale Devices, January 2008

Author: Atienza David
Ayala José L.
Benini Luca
De Micheli Giovanni
DeBole Michael
Narayanan Vijay
Pablo G.
Valle Del
Publication venue
Publication date: 13/02/2008
Field of study

Continuous transistor scaling due to improvements in CMOS devices and manufacturing technologies is increasing processor power densities and temperatures; thus, creating challenges to maintain manufacturing yield rates and reliable devices in their expected lifetimes for latest nanometer-scale dimensions. In fact, new system and processor microarchitectures require new reliability-aware design methods and exploration tools that can face these challenges without significantly increasing manufacturing cost, reducing system performance or imposing large area overheads due to redundancy. In this paper we overview the latest approaches in reliability modeling and variability-tolerant design for latest technology nodes, and advocate the need of reliability-aware design for forthcoming consumer electronics. Moreover, we illustrate with a case study of an embedded processor that effec- tive reliability-aware design can be achieved in nanometer-scale devices through integral design approaches that covers modeling and exploration of reliability effects, and hardware-software architectural techniques to provide reliability-enhanced solutions at both microarchitectural- and system-level

Infoscience - École polytechnique fédérale de Lausanne

Dynamic thermal management in 3D multicore architectures

Author: Atienza Alonso David
Coskun Ayse Kivilcim
Leblebici Yusuf
Luis José
Rodrigo Ayala
Simunic Rosing Tajana
Publication venue
Publication date: 07/04/2011
Field of study

Technology scaling has caused the feature sizes to shrink continuously, whereas interconnects, unlike transistors, have not followed the same trend. Designing 3D stack architectures is a recently proposed approach to overcome the power consumption and delay problems associated with the interconnects by reducing the length of the wires going across the chip. However, 3D integration introduces serious thermal challenges due to the high power density resulting from placing computational units on top of each other. In this work, we first investigate how the existing thermal management, power management and job scheduling policies affect the thermal behavior in 3D chips. We then propose a dynamic thermally-aware job scheduling technique for 3D systems to reduce the thermal problems at very low performance cost. Our technique can also be integrated with power management policies to reduce energy consumption while avoiding the thermal hot spots and large temperature variations

Infoscience - École polytechnique fédérale de Lausanne

Proactive temperature balancing for low cost thermal management in MPSoCs

Author: Ayse Kivilcim
Coskun Tajana
Kenny C. Gross
Simunic Rosing
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Abstract — Designing thermal management strategies that reduce the impact of hot spots and on-die temperature variations at low performance cost is a very significant challenge for multiprocessor system-on-chips (MPSoCs). In this work, we present a proactive MPSoC thermal man-agement approach, which predicts the future temperature and adjusts the job allocation on the MPSoC to minimize the impact of thermal hot spots and temperature variations without degrading performance. In addition, we implement and compare several reactive and proactive management strategies, and demonstrate that our proactive temperature-aware MPSoC job allocation technique is able to dramatically reduce the adverse effects of temperature at very low performance cost. We show experimental results using a simulator as well as an implementation on an UltraSPARC T1 system. I

CiteSeerX

Crossref

Dynamic thermal management in 3D multicore architectures

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Energy reconstruction on the LHC ATLAS TileCal upgraded front end: feasibility study for a sROD co-processing unit

Author: Cox Mitchell Arij
Publication venue
Publication date: 10/05/2016
Field of study

Dissertation presented in ful lment of the requirements for the degree of: Master of Science in Physics 2016The Phase-II upgrade of the Large Hadron Collider at CERN in the early 2020s will enable an order of magnitude increase in the data produced, unlocking the potential for new physics discoveries. In the ATLAS detector, the upgraded Hadronic Tile Calorimeter (TileCal) Phase-II front end read out system is currently being prototyped to handle a total data throughput of 5.1 TB/s, from the current 20.4 GB/s. The FPGA based Super Read Out Driver (sROD) prototype must perform an energy reconstruction algorithm on 2.88 GB/s raw data, or 275 million events per second. Due to the very high level of pro ciency required and time consuming nature of FPGA rmware development, it may be more e ective to implement certain complex energy reconstruction and monitoring algorithms on a general purpose, CPU based sROD co-processor. Hence, the feasibility of a general purpose ARM System on Chip based co-processing unit (PU) for the sROD is determined in this work. A PCI-Express test platform was designed and constructed to link two ARM Cortex-A9 SoCs via their PCI-Express Gen-2 x1 interfaces. Test results indicate that the latency of the PCI-Express interface is su ciently low and the data throughput is superior to that of alternative interfaces such as Ethernet, for use as an interconnect for the SoCs to the sROD. CPU performance benchmarks were performed on ve ARM development platforms to determine the CPU integer, oating point and memory system performance as well as energy e ciency. To complement the benchmarks, Fast Fourier Transform and Optimal Filtering (OF) applications were also tested. Based on the test results, in order for the PU to process 275 million events per second with OF, within the 6 s timing budget of the ATLAS triggering system, a cluster of three Tegra-K1, Cortex-A15 SoCs connected to the sROD via a Gen-2 x8 PCI-Express interface would be suitable. A high level design for the PU is proposed which surpasses the requirements for the sROD co-processor and can also be used in a general purpose, high data throughput system, with 80 Gb/s Ethernet and 15 GB/s PCI-Express throughput, using four X-Gene SoCs

Wits Institutional Repository on DSPACE

Classification of Resilience Techniques Against Functional Errors at Higher Abstraction Layers of Digital Systems

Author: Atienza David
Catthoor Francky
Gemmeke Tobias
Noll Tobias G.
Psychou Georgia
Rodopoulos Dimitrios
Sabry Mohamed M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Nanoscale technology nodes bring reliability concerns back to the center stage of digital system design. A systematic classification of approaches that increase system resilience in the presence of functional hardware (HW)-induced errors is presented, dealing with higher system abstractions, such as the (micro) architecture, the mapping, and platform software (SW). The field is surveyed in a systematic way based on nonoverlapping categories, which add insight into the ongoing work by exposing similarities and differences. HW and SW solutions are discussed in a similar fashion so that interrelationships become apparent. The presented categories are illustrated by representative literature examples to illustrate their properties. Moreover, it is demonstrated how hybrid schemes can be decomposed into their primitive components

Infoscience - École polytechnique fédérale de Lausanne

Publikationsserver der RWTH Aachen University

Estimation à haut-niveau des dégradations temporelles dans les processeurs (méthodologie et mise en oeuvre logicielle)

Author: BERTOLINI Clément
MARC François
Publication venue
Publication date: 01/01/2013
Field of study

Actuellement, les circuits numériques nécessitent d'être de plus en plus performants. Aussi, les produits doivent être conçus le plus rapidement possible afin de gagner les précieuses parts de marché. Les méthodes rapides de conception et l'utilisation de MPSoC ont permis de satisfaire à ces exigences, mais sans tenir compte précisément de l'impact du vieillissement des circuits sur la conception. Or les MPSoC utilisent les technologies de fabrication les plus récentes et sont de plus en plus soumis aux défaillances matérielles. De nos jours, les principaux mécanismes de défaillance observés dans les transistors des MPSoC sont le HCI et le NBTI. Des marges sont alors ajoutées pour que le circuit soit fonctionnel pendant son utilisation, en considérant le cas le plus défavorable pour chaque mécanisme. Ces marges deviennent de plus en plus importantes et diminuent les performances attendues. C'est pourquoi les futures méthodes de conception nécessitent de tenir compte des dégradations matérielles en fonction de l utilisation du circuit. Dans cette thèse, nous proposons une méthode originale pour simuler le vieillissement des MPSoC à haut niveau d'abstraction. Cette méthode s'applique lors de la conception du système c.-à-d. entre l'étape de définition des spécifications et la mise en production. Un modèle empirique permet d'estimer les dégradations temporelles en fin de vie d'un circuit. Un exemple d'application est donné pour un processeur embarqué et les résultats pour un ensemble d'applications sont reportés. La solution proposée permet d'explorer différentes configurations d'une architecture MPSoC pour comparer le vieillissement. Aussi, l'application la plus sévère pour le vieillissement peut être identifiée.Nowadays, more and more performance is expected from digital circuits. What s more, the market requires fast conception methods, in order to propose the newest technology available. Fast conception methods and the utilization of MPSoC have enabled high performance and short time-to-market while taking little attention to aging. However, MPSoC are more and more prone to hardware failures that occur in transistors. Today, the prevailing failure mechanisms in MPSoC are HCI and NBTI. Margins are usually added on new products to avoid failures during execution, by considering worst case scenario for each mechanism. For the newest technology, margins are becoming more and more important and products performance is getting lower and lower. That s why the conception needs to take into account hardware failures according to the execution of software. This thesis propose a new methodology to simulate aging at high level of abstraction, which can be applied to MPSoC. The method can be applied during product conception, between the specification phase and the production. An empirical model is used to estimate slack time at circuit's end of life. A use case is conducted on an embedded processor and degradation results are reported for a set of applications. The solution enables architecture exploration and MPSoC aging can thus be compared. The software with most severe impact on aging can also be determined.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

OpenGrey Repository