8 research outputs found

    Design for Reliability and Low Power in Emerging Technologies

    Get PDF
    Die fortlaufende Verkleinerung von Transistor-StrukturgrĂ¶ĂŸen ist einer der wichtigsten Antreiber fĂŒr das Wachstum in der Halbleitertechnologiebranche. Seit Jahrzehnten erhöhen sich sowohl Integrationsdichte als auch KomplexitĂ€t von Schaltkreisen und zeigen damit einen fortlaufenden Trend, der sich ĂŒber alle modernen FertigungsgrĂ¶ĂŸen erstreckt. Bislang ging das Verkleinern von Transistoren mit einer Verringerung der Versorgungsspannung einher, was zu einer Reduktion der Leistungsaufnahme fĂŒhrte und damit eine gleichbleibenden Leistungsdichte sicherstellte. Doch mit dem Beginn von StrukturgrĂ¶ĂŸen im Nanometerbreich verlangsamte sich die fortlaufende Skalierung. Viele Schwierigkeiten, sowie das Erreichen von physikalischen Grenzen in der Fertigung und Nicht-IdealitĂ€ten beim Skalieren der Versorgungsspannung, fĂŒhrten zu einer Zunahme der Leistungsdichte und, damit einhergehend, zu erschwerten Problemen bei der Sicherstellung der ZuverlĂ€ssigkeit. Dazu zĂ€hlen, unter anderem, Alterungseffekte in Transistoren sowie ĂŒbermĂ€ĂŸige Hitzeentwicklung, nicht zuletzt durch stĂ€rkeres Auftreten von Selbsterhitzungseffekten innerhalb der Transistoren. Damit solche Probleme die ZuverlĂ€ssigkeit eines Schaltkreises nicht gefĂ€hrden, werden die internen Signallaufzeiten ĂŒblicherweise sehr pessimistisch kalkuliert. Durch den so entstandenen zeitlichen Sicherheitsabstand wird die korrekte FunktionalitĂ€t des Schaltkreises sichergestellt, allerdings auf Kosten der Performance. Alternativ kann die ZuverlĂ€ssigkeit des Schaltkreises auch durch andere Techniken erhöht werden, wie zum Beispiel durch Null-Temperatur-Koeffizienten oder Approximate Computing. Wenngleich diese Techniken einen Großteil des ĂŒblichen zeitlichen Sicherheitsabstandes einsparen können, bergen sie dennoch weitere Konsequenzen und Kompromisse. Bleibende Herausforderungen bei der Skalierung von CMOS Technologien fĂŒhren außerdem zu einem verstĂ€rkten Fokus auf vielversprechende Zukunftstechnologien. Ein Beispiel dafĂŒr ist der Negative Capacitance Field-Effect Transistor (NCFET), der eine beachtenswerte Leistungssteigerung gegenĂŒber herkömmlichen FinFET Transistoren aufweist und diese in Zukunft ersetzen könnte. Des Weiteren setzen Entwickler von Schaltkreisen vermehrt auf komplexe, parallele Strukturen statt auf höhere Taktfrequenzen. Diese komplexen Modelle benötigen moderne Power-Management Techniken in allen Aspekten des Designs. Mit dem Auftreten von neuartigen Transistortechnologien (wie zum Beispiel NCFET) mĂŒssen diese Power-Management Techniken neu bewertet werden, da sich AbhĂ€ngigkeiten und VerhĂ€ltnismĂ€ĂŸigkeiten Ă€ndern. Diese Arbeit prĂ€sentiert neue Herangehensweisen, sowohl zur Analyse als auch zur Modellierung der ZuverlĂ€ssigkeit von Schaltkreisen, um zuvor genannte Herausforderungen auf mehreren Designebenen anzugehen. Diese Herangehensweisen unterteilen sich in konventionelle Techniken ((a), (b), (c) und (d)) und unkonventionelle Techniken ((e) und (f)), wie folgt: (a)\textbf{(a)} Analyse von Leistungszunahmen in Zusammenhang mit der Maximierung von Leistungseffizienz beim Betrieb nahe der Transistor Schwellspannung, insbesondere am optimalen Leistungspunkt. Das genaue Ermitteln eines solchen optimalen Leistungspunkts ist eine besondere Herausforderung bei Multicore Designs, da dieser sich mit den jeweiligen Optimierungszielsetzungen und der Arbeitsbelastung verschiebt. (b)\textbf{(b)} Aufzeigen versteckter Interdependenzen zwischen Alterungseffekten bei Transistoren und Schwankungen in der Versorgungsspannung durch „IR-drops“. Eine neuartige Technik wird vorgestellt, die sowohl Über- als auch UnterschĂ€tzungen bei der Ermittlung des zeitlichen Sicherheitsabstands vermeidet und folglich den kleinsten, dennoch ausreichenden Sicherheitsabstand ermittelt. (c)\textbf{(c)} EindĂ€mmung von Alterungseffekten bei Transistoren durch „Graceful Approximation“, eine Technik zur Erhöhung der Taktfrequenz bei Bedarf. Der durch Alterungseffekte bedingte zeitlich Sicherheitsabstand wird durch Approximate Computing Techniken ersetzt. Des Weiteren wird Quantisierung verwendet um ausreichend Genauigkeit bei den Berechnungen zu gewĂ€hrleisten. (d)\textbf{(d)} EindĂ€mmung von temperaturabhĂ€ngigen Verschlechterungen der Signallaufzeit durch den Betrieb nahe des Null-Temperatur Koeffizienten (N-ZTC). Der Betrieb bei N-ZTC minimiert temperaturbedingte Abweichungen der Performance und der Leistungsaufnahme. Qualitative und quantitative Vergleiche gegenĂŒber dem traditionellen zeitlichen Sicherheitsabstand werden prĂ€sentiert. (e)\textbf{(e)} Modellierung von Power-Management Techniken fĂŒr NCFET-basierte Prozessoren. Die NCFET Technologie hat einzigartige Eigenschaften, durch die herkömmliche Verfahren zur Spannungs- und Frequenzskalierungen zur Laufzeit (DVS/DVFS) suboptimale Ergebnisse erzielen. Dies erfordert NCFET-spezifische Power-Management Techniken, die in dieser Arbeit vorgestellt werden. (f)\textbf{(f)} Vorstellung eines neuartigen heterogenen Multicore Designs in NCFET Technologie. Das Design beinhaltet identische Kerne; HeterogenitĂ€t entsteht durch die Anwendung der individuellen, optimalen Konfiguration der Kerne. Amdahls Gesetz wird erweitert, um neue system- und anwendungsspezifische Parameter abzudecken und die VorzĂŒge des neuen Designs aufzuzeigen. Die Auswertungen der vorgestellten Techniken werden mithilfe von Implementierungen und Simulationen auf Schaltkreisebene (gate-level) durchgefĂŒhrt. Des Weiteren werden Simulatoren auf Systemebene (system-level) verwendet, um Multicore Designs zu implementieren und zu simulieren. Zur Validierung und Bewertung der EffektivitĂ€t gegenĂŒber dem Stand der Technik werden analytische, gate-level und system-level Simulationen herangezogen, die sowohl synthetische als auch reale Anwendungen betrachten

    Energy Aware Runtime Systems for Elastic Stream Processing Platforms

    Get PDF
    Following an invariant growth in the required computational performance of processors, the multicore revolution started around 20 years ago. This revolution was mainly an answer to power dissipation constraints restricting the increase of clock frequency in single-core processors. The multicore revolution not only brought in the challenge of parallel programming, i.e. being able to develop software exploiting the entire capabilities of manycore architectures, but also the challenge of programming heterogeneous platforms. The question of “on which processing element to map a specific computational unit?”, is well known in the embedded community. With the introduction of general-purpose graphics processing units (GPGPUs), digital signal processors (DSPs) along with many-core processors on different system-on-chip platforms, heterogeneous parallel platforms are nowadays widespread over several domains, from consumer devices to media processing platforms for telecom operators. Finding mapping together with a suitable hardware architecture is a process called design-space exploration. This process is very challenging in heterogeneous many-core architectures, which promise to offer benefits in terms of energy efficiency. The main problem is the exponential explosion of space exploration. With the recent trend of increasing levels of heterogeneity in the chip, selecting the parameters to take into account when mapping software to hardware is still an open research topic in the embedded area. For example, the current Linux scheduler has poor performance when mapping tasks to computing elements available in hardware. The only metric considered is CPU workload, which as was shown in recent work does not match true performance demands from the applications. Doing so may produce an incorrect allocation of resources, resulting in a waste of energy. The origin of this research work comes from the observation that these approaches do not provide full support for the dynamic behavior of stream processing applications, especially if these behaviors are established only at runtime. This research will contribute to the general goal of developing energy-efficient solutions to design streaming applications on heterogeneous and parallel hardware platforms. Streaming applications are nowadays widely spread in the software domain. Their distinctive characiteristic is the retrieving of multiple streams of data and the need to process them in real time. The proposed work will develop new approaches to address the challenging problem of efficient runtime coordination of dynamic applications, focusing on energy and performance management.Efter en oförĂ€nderlig tillvĂ€xt i prestandakrav hos processorer, började den flerkĂ€rniga processor-revolutionen för ungefĂ€r 20 Ă„r sedan. Denna revolution skedde till största del som en lösning till begrĂ€nsningar i energieffekten allt eftersom klockfrekvensen kontinuerligt höjdes i en-kĂ€rniga processorer. Den flerkĂ€rniga processor-revolutionen medförde inte enbart utmaningen gĂ€llande parallellprogrammering, m.a.o. förmĂ„gan att utveckla mjukvara som anvĂ€nder sig av alla delelement i de flerkĂ€rniga processorerna, men ocksĂ„ utmaningen med programmering av heterogena plattformar. FrĂ„gestĂ€llningen ”pĂ„ vilken processorelement skall en viss berĂ€kning utföras?” Ă€r vĂ€l kĂ€nt inom ramen för inbyggda datorsystem. Efter introduktionen av grafikprocessorer för allmĂ€nna berĂ€kningar (GPGPU), signalprocesserings-processorer (DSP) samt flerkĂ€rniga processorer pĂ„ olika system-on-chip plattformar, Ă€r heterogena parallella plattformar idag omfattande inom mĂ„nga domĂ€ner, frĂ„n konsumtionsartiklar till mediaprocesseringsplattformar för telekommunikationsoperatörer. Processen att placera berĂ€kningarna pĂ„ en passande hĂ„rdvaruplattform kallas för utforskning av en designrymd (design-space exploration). Denna process Ă€r mycket utmanande för heterogena flerkĂ€rniga arkitekturer, och kan medföra fördelar nĂ€r det gĂ€ller energieffektivitet. Det största problemet Ă€r att de olika valmöjligheterna i designrymden kan vĂ€xa exponentiellt. Enligt den nuvarande trenden som förespĂ„r ökad heterogeniska aspekter i processorerna Ă€r utmaningen att hitta den mest passande placeringen av berĂ€kningarna pĂ„ hĂ„rdvaran Ă€nnu en forskningsfrĂ„ga inom ramen för inbyggda datorsystem. Till exempel, den nuvarande schemalĂ€ggaren i Linux operativsystemet Ă€r inkapabel att hitta en effektiv placering av berĂ€kningarna pĂ„ den underliggande hĂ„rdvaran. Det enda mĂ€tsĂ€ttet som anvĂ€nds Ă€r processorns belastning vilket, som visats i tidigare forskning, inte motsvarar den verkliga prestandan i applikationen. AnvĂ€ndning av detta mĂ€tsĂ€tt vid resursallokering resulterar i slöseri med energi. Denna forskning hĂ€rstammar frĂ„n observationerna att dessa tillvĂ€gagĂ„ngssĂ€tt inte stöder det dynamiska beteendet hos ström-processeringsapplikationer (stream processing applications), speciellt om beteendena bara etableras vid körtid. Denna forskning kontribuerar till det allmĂ€nna mĂ„let att utveckla energieffektiva lösningar för ström-applikationer (streaming applications) pĂ„ heterogena flerkĂ€rniga hĂ„rdvaruplattformar. Ström-applikationer Ă€r numera mycket vanliga i mjukvarudomĂ€n. Deras distinkta karaktĂ€r Ă€r inlĂ€sning av flertalet dataströmmar, och behov av att processera dem i realtid. Arbetet i denna forskning understöder utvecklingen av nya sĂ€tt för att lösa det utmanade problemet att effektivt koordinera dynamiska applikationer i realtid och fokus pĂ„ energi- och prestandahantering

    Building Efficient and Reliable Emerging Technology Systems

    Full text link
    The semiconductor industry has been reaping the benefits of Moore’s law powered by Dennard’s voltage scaling for the past fifty years. However, with the end of Dennard scaling, silicon chip manufacturers are facing a widespread plateau in performance improvements. While the architecture community has focused its effort on exploring parallelism, such as with multi-core, many-core and accelerator-based systems, chip manufacturers have been forced to explore beyond-Moore technologies to improve performance while maintaining power density. Examples of such technologies include monolithic 3D integration, carbon nanotube transistors, tunneling-based transistors, spintronics and quantum computing. However, the infancy of the manufacturing process of these new technologies impedes their usage in commercial products. The goal of this dissertation is to combine both architectural and device-level efforts to provide solutions across the computing stack that can overcome the reliability concerns of emerging technologies. This allows for beyond-Moore systems to compete with highly optimized silicon-based processors, thus, enabling faster commercialization of such systems. This dissertation proposes the following key steps: (i) Multifaceted understanding and modeling of variation and yield issues that occur in emerging technologies, such as carbon nanotube transistors (CNFETs). (ii) Design of systems using suitable logic families such as pass transistor logic that provide high performance. (iii) Design of a multi-granular fault-tolerant reconfigurable architecture that enhances yield and performance. (iv) Design of a multi-technology, multi-accelerator heterogeneous system (v) Development of real-time constrained efficient workload scheduling mechanism for heterogeneous systems. This dissertation first presents the use of pass transistor logic family as an alternate to the CMOS logic family for CNFETs to improve performance. It explores various architectural design choices for CNFETs using pass transistor logic (PTL) to create an energy-efficient RISC-V processor. Our results show that while a CNFET RISC-V processor using CMOS logic achieves a 2.9x energy-delay product (EDP) improvement over a silicon design, using PTL along the critical path components of the processor can boost EDP improvement by 5x as well as reduce area by 17% over 16 nm silicon CMOS. This document further builds on providing fault-tolerant and yield enhancing solutions for emerging 3D integration compatible technologies in the context of CNFETs. The proposed framework can efficiently support high-variation technologies by providing protection against manufacturing defects at multiple granularities: module and pipeline-stage levels. Based on the variation observed in a synthesized design, a reliable CNFET-based 3D multi-granular reconfigurable architecture, 3DTUBE, is presented to overcome the manufacturing difficulties. For 0.4-0.7 V, 3DTUBE provides up to 6.0x higher throughput and 3.1x lower EDP compared to a silicon-based multi-core design evaluated at 1 part per billion transistor failure rate, which is 10,000x lower in comparison to CNFET’s failure rate. This dissertation then ventures into building multi-accelerator heterogeneous systems and real-time schedulers that cater to the requirements of the applications while taking advantage of the underlying heterogeneous system. We introduce optimizations like task pruning, hierarchical hetero-ranking and rank update built upon two scheduler policies (MS-static and MS-dynamic), that result in a performance improvement of 3.5x (average) for real-world autonomous vehicle applications, when compared against state-of-the-art schedulers. Adopting insights from the above work, this thesis presents a multi-accelerator, multi-technology heterogeneous system powered by a multi-constrained scheduler that optimizes for varying task requirements to achieve up to 6.1x better energy over a baseline silicon-based system.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169699/1/aporvaa_1.pd

    Hardware / Software Architectural and Technological Exploration for Energy-Efficient and Reliable Biomedical Devices

    Get PDF
    Nowadays, the ubiquity of smart appliances in our everyday lives is increasingly strengthening the links between humans and machines. Beyond making our lives easier and more convenient, smart devices are now playing an important role in personalized healthcare delivery. This technological breakthrough is particularly relevant in a world where population aging and unhealthy habits have made non-communicable diseases the first leading cause of death worldwide according to international public health organizations. In this context, smart health monitoring systems termed Wireless Body Sensor Nodes (WBSNs), represent a paradigm shift in the healthcare landscape by greatly lowering the cost of long-term monitoring of chronic diseases, as well as improving patients' lifestyles. WBSNs are able to autonomously acquire biological signals and embed on-node Digital Signal Processing (DSP) capabilities to deliver clinically-accurate health diagnoses in real-time, even outside of a hospital environment. Energy efficiency and reliability are fundamental requirements for WBSNs, since they must operate for extended periods of time, while relying on compact batteries. These constraints, in turn, impose carefully designed hardware and software architectures for hosting the execution of complex biomedical applications. In this thesis, I develop and explore novel solutions at the architectural and technological level of the integrated circuit design domain, to enhance the energy efficiency and reliability of current WBSNs. Firstly, following a top-down approach driven by the characteristics of biomedical algorithms, I perform an architectural exploration of a heterogeneous and reconfigurable computing platform devoted to bio-signal analysis. By interfacing a shared Coarse-Grained Reconfigurable Array (CGRA) accelerator, this domain-specific platform can achieve higher performance and energy savings, beyond the capabilities offered by a baseline multi-processor system. More precisely, I propose three CGRA architectures, each contributing differently to the maximization of the application parallelization. The proposed Single, Multi and Interleaved-Datapath CGRA designs allow the developed platform to achieve substantial energy savings of up to 37%, when executing complex biomedical applications, with respect to a multi-core-only platform. Secondly, I investigate how the modeling of technology reliability issues in logic and memory components can be exploited to adequately adjust the frequency and supply voltage of a circuit, with the aim of optimizing its computing performance and energy efficiency. To this end, I propose a novel framework for workload-dependent Bias Temperature Instability (BTI) impact analysis on biomedical application results quality. Remarkably, the framework is able to determine the range of safe circuit operating frequencies without introducing worst-case guard bands. Experiments highlight the possibility to safely raise the frequency up to 101% above the maximum obtained with the classical static timing analysis. Finally, through the study of several well-known biomedical algorithms, I propose an approach allowing energy savings by dynamically and unequally protecting an under-powered data memory in a new way compared to regular error protection schemes. This solution relies on the Dynamic eRror compEnsation And Masking (DREAM) technique that reduces by approximately 21% the energy consumed by traditional error correction codes

    Cost-Efficient Soft-Error Resiliency for ASIP-based Embedded Systems

    Full text link
    Recent decades have witnessed the rapid growth of embedded systems. At present, embedded systems are widely applied in a broad range of critical applications including automotive electronics, telecommunication, healthcare, industrial electronics, consumer electronics military and aerospace. Human society will continue to be greatly transformed by the pervasive deployment of embedded systems. Consequently, substantial amount of efforts from both industry and academic communities have contributed to the research and development of embedded systems. Application-specific instruction-set processor (ASIP) is one of the key advances in embedded processor technology, and a crucial component in some embedded systems. Soft errors have been directly observed since the 1970s. As devices scale, the exponential increase in the integration of computing systems occurs, which leads to correspondingly decrease in the reliability of computing systems. Today, major research forums state that soft errors are one of the major design technology challenges at and beyond the 22 nm technology node. Therefore, a large number of soft-error solutions, including error detection and recovery, have been proposed from differing perspectives. Nonetheless, most of the existing solutions are designed for general or high-performance systems which are different to embedded systems. For embedded systems, the soft-error solutions must be cost-efficient, which requires the tailoring of the processor architecture with respect to the feature of the target application. This thesis embodies a series of explorations for cost-efficient soft-error solutions for ASIP-based embedded systems. In this exploration, five major solutions are proposed. The first proposed solution realizes checkpoint recovery in ASIPs. By generating customized instructions, ASIP-implemented checkpoint recovery can perform at a finer granularity than what was previously possible. The fault-free performance overhead of this solution is only 1.45% on average. The recovery delay is only 62 cycles at the worst case. The area and leakage power overheads are 44.4% and 45.6% on average. The second solution explores utilizing two primitive error recovery techniques jointly. This solution includes three application-specific optimization methodologies. This solution generates the optimized error-resilient ASIPs, based on the characteristics of primitive error recovery techniques, static reliability analysis and design constraints. The resultant ASIP can be configured to perform at runtime according to the optimized recovery scheme. This solution can strategically enhance cost-efficiency for error recovery. In order to guarantee cost-efficiency in unpredictable runtime situations, the third solution explores runtime adaptation for error recovery. This solution aims to budget and adapt the error recovery operations, so as to spend the resources intelligently and to tolerate adverse influences of runtime variations. The resultant ASIP can make runtime decisions to determine the activation of spatial and temporal redundancies, according to the runtime situations. At the best case, this solution can achieve almost 50x reliability gain over the state of the art solutions. Given the increasing demand for multi-core computing systems, the last two proposed solutions target error recovery in multi-core ASIPs. The first solution of these two explores ASIP-implemented fine-grained process migration. This solution is a key infrastructure, which allows cost-efficient task management, for realizing cost-efficient soft-error recovery in multi-core ASIPs. The average time cost is only 289 machine cycles to perform process migration. The last solution explores using dynamic and adaptive mapping to assign heterogeneous recovery operations to the tasks in the multi-core context. This solution allows each individual ASIP-based processing core to dynamically adapt its specific error recovery functionality according to the corresponding task's characteristics, in terms of soft error vulnerability and execution time deadline. This solution can significantly improve the reliability of the system by almost two times, with graceful constraint penalty, in comparison to the state-of-the-art counterparts

    Reliable Software for Unreliable Hardware - A Cross-Layer Approach

    Get PDF
    A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers

    Thermal Management for Dependable On-Chip Systems

    Get PDF
    This thesis addresses the dependability issues in on-chip systems from a thermal perspective. This includes an explanation and analysis of models to show the relationship between dependability and tempature. Additionally, multiple novel methods for on-chip thermal management are introduced aiming to optimize thermal properties. Analysis of the methods is done through simulation and through infrared thermal camera measurements

    Adaptive monitoring and control framework in Application Service Management environment

    Get PDF
    The economics of data centres and cloud computing services have pushed hardware and software requirements to the limits, leaving only very small performance overhead before systems get into saturation. For Application Service Management–ASM, this carries the growing risk of impacting the execution times of various processes. In order to deliver a stable service at times of great demand for computational power, enterprise data centres and cloud providers must implement fast and robust control mechanisms that are capable of adapting to changing operating conditions while satisfying service–level agreements. In ASM practice, there are normally two methods for dealing with increased load, namely increasing computational power or releasing load. The first approach typically involves allocating additional machines, which must be available, waiting idle, to deal with high demand situations. The second approach is implemented by terminating incoming actions that are less important to new activity demand patterns, throttling, or rescheduling jobs. Although most modern cloud platforms, or operating systems, do not allow adaptive/automatic termination of processes, tasks or actions, it is administrators’ common practice to manually end, or stop, tasks or actions at any level of the system, such as at the level of a node, function, or process, or kill a long session that is executing on a database server. In this context, adaptive control of actions termination remains a significantly underutilised subject of Application Service Management and deserves further consideration. For example, this approach may be eminently suitable for systems with harsh execution time Service Level Agreements, such as real–time systems, or systems running under conditions of hard pressure on power supplies, systems running under variable priority, or constraints set up by the green computing paradigm. Along this line of work, the thesis investigates the potential of dimension relevance and metrics signals decomposition as methods that would enable more efficient action termination. These methods are integrated in adaptive control emulators and actuators powered by neural networks that are used to adjust the operation of the system to better conditions in environments with established goals seen from both system performance and economics perspectives. The behaviour of the proposed control framework is evaluated using complex load and service agreements scenarios of systems compatible with the requirements of on–premises, elastic compute cloud deployments, server–less computing, and micro–services architectures
    corecore