Search CORE

10 research outputs found

Recommended from our members

Characterization and management of voltage noise in multi-core, multi-threaded processors

Author: Kim Youngtaek
Publication venue
Publication date: 14/07/2014
Field of study

textReliability is one of the important issues of recent microprocessor design. Processors must provide correct behavior as users expect, and must not fail at any time. However, unreliable operation can be caused by excessive supply voltage fluctuations due to an inductive part in a microprocessor power distribution network. This voltage fluctuation issue is referred to as inductive or di/dt noise, and requires thorough analysis and sophisticated design solutions. This dissertation proposes an automated stressmark generation framework to characterize di/dt noise effect, and suggests a practical solution for management of di/dt effects while achieving performance and energy goals. First, the di/dt noise issue is analyzed from theory to a practical view. Inductance is a parasitic part in power distribution network for microprocessor, and its characteristics such as resonant frequencies are reviewed. Then, it is shown that supply voltage fluctuation from resonant behavior is much harmful than single event voltage fluctuations. Voltage fluctuations caused by standard benchmarks such as SPEC CPU2006, PARSEC, Linpack, etc. are studied. Next, an AUtomated DI/dT stressmark generation framework, referred to as AUDIT, is proposed to identify maximum voltage droop in a microprocessor power distribution network. The di/dt stressmark generated from AUDIT framework is an instruction sequence, which draws periodic high and low current pulses that maximize voltage fluctuations including voltage droops. AUDIT uses a Genetic Algorithm in scheduling and optimizing candidate instruction sequences to create a maximum voltage droop. In addition, AUDIT provides with both simulation and hardware measurement methods for finding maximum voltage droops in different design and verification stages of a processor. Failure points in hardware due to voltage droops are analyzed. Finally, a hardware technique, floating-point (FP) issue throttling, is examined, which provides a reduction in worst case voltage droop. This dissertation shows the impact of floating point throttling on voltage droop, and translates this reduction in voltage droop to an increase in operating frequency because additional guardband is no longer required to guard against droops resulting from heavy floating point usage. This dissertation presents two techniques to dynamically determine when to tradeoff FP throughput for reduced voltage margin and increased frequency. These techniques can work in software level without any modification of existing hardware.Electrical and Computer Engineerin

Texas ScholarWorks

Hardware/Software Co-design for Multicore Architectures

Author: Xu Thomas Canhao
Publication venue: Turku Centre for Computer Science
Publication date: 12/09/2012
Field of study

Siirretty Doriast

UTUPub

Static task mapping for tiled chip multiprocessors with multiple voltage islands

Author: Cortadella Jordi
Nikitin Nikita
Publication venue
Publication date: 01/01/2011
Field of study

The complexity of large Chip Multiprocessors (CMP) makes design reuse a practical approach to reduce the manufacturing and design cost of high-performance systems. This paper proposes techniques for static task mapping onto general-purpose CMPs with multiple pre-defined voltage islands for power management. The CMPs are assumed to contain different classes of processing elements with multiple voltage/frequency execution modes to better cover a large range of applications. Task mapping is performed with awareness of both on-chip and off-chip memory traffic, and communication constraints such as the link and memory bandwidth. Besides proposing a linear programming model for small systems, a novel mapping approach based on Extremal Optimization is proposed for large-scale CMPs. This new combinatorial optimization method has delivered very good results in quality and computational cost when compared to the classical simulated annealing.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Reliability in the face of variability in nanometer embedded memories

Author: Ganapathy Shrikanth
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2014
Field of study

In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Optimisation de dispositifs FDSOI pour la gestion de la consommation et de la vitesse (application aux mémoires et fonctions logiques)

Author: AMARA Amara
NOEL Jean-philippe
Publication venue
Publication date: 01/01/2011
Field of study

Avec la percée des téléphones portables et des tablettes numériques intégrant des fonctions avancées de traitement de l'information, une croissance exponentielle du marché des systèmes sur puce (SoC pour System On Chip en anglais) est attendue jusqu'en 2016. Ces systèmes, conçus dans les dernières technologies nanométriques, nécessitent des vitesses de fonctionnement très élevées pour offrir des performances incroyables, tout en consommant remarquablement peu. Cependant, concevoir de tels systèmes à l'échelle nanométrique présente de nombreux enjeux en raison de l'accentuation d'effets parasites avec la miniaturisation des transistors MOS sur silicium massif, rendant les circuits plus sensibles aux phénomènes de fluctuations des procédés de fabrication et moins efficaces énergétiquement. La technologie planaire complètement désertée (FD pour Fully depleted en anglais) SOI, offrant un meilleur contrôle du canal du transistor et une faible variabilité de sa tension de seuil grâce à un film de silicium mince et non dopé, apparaît comme une solution technologique très bien adaptée pour répondre aux besoins de ces dispositifs nomades alliant hautes performances et basse consommation. Cependant pour que cette technologie soit viable, il est impératif qu'elle réponde aux besoins des plateformes de conception basse consommation. Un des défis majeurs de l'état de l'art de la technologie planaire FDSOI est de fournir les différentes tensions de seuils (VT) requises pour la gestion de la consommation et de la vitesse. Le travail de recherche de thèse présenté dans ce mémoire a contribué à la mise en place d'une plateforme de conception multi-VT en technologie planaire FDSOI sur oxyde enterré mince (UTB pour Ultra Thin Buried oxide en anglais) pour les nœuds technologiques sub-32 nm. Pour cela, les éléments clefs des plateformes de conception basse consommation en technologie planaire sur silicium massif ont été identifiés. A la suite de cette analyse, différentes architectures de transistors MOS multi-VT FDSOI ont été développées. L'analyse au niveau des circuits numériques et mémoires élémentaires a permis de mettre en avant deux solutions fiables, efficaces et de faible complexité technologique. Les performances des solutions apportées ont été évaluées sur un chemin critique extrait du cœur de processeur ARM Cortex A9 et sur une cellule SRAM 6T haute densité (0,120 m ). Egalement, une cellule SRAM à quatre transistors est proposée, démontrant la flexibilité au niveau conception des solutions proposées. Ce travail de recherche a donné lieu à de nombreuses publications, communications et brevets. Aujourd'hui, la majorité des résultats obtenus ont été transférés chez STMicroelectronics, où l'étude de leur industrialisation est en cours.Driven by the strong growth of smartphone and tablet devices, an exponential growth for the mobile SoC market is forecasted up to 2016. These systems, designed in the latest nanometre technology, require very high speeds to deliver tremendous performances, while consuming remarkably little. However, designing such systems at the nanometre scale introduces many challenges due to the emphasis of parasitic phenomenon effects driven by the scaling of bulk MOSFETs, making circuits more sensitive to the manufacturing process fluctuations and less energy efficient. Undoped thin-film planar fully depleted silicon-on-insulator (FDSOI) devices are being investigated as an alternative to bulk devices in 28nm node and beyond, thanks to its excellent short-channel electrostatic control, low leakage currents and immunity to random dopant fluctuation. This compelling technology appears to meet the needs of nomadic devices, combining high performance and low power consumption. However, to be useful, it is essential that this technology is compatible with low operating power design platforms. A major challenge for this technology is to provide various device threshold voltages (VT), trading off power consumption and speed. The research work presented in this thesis has contributed to the development of a multi-VT design platform in FDSOI planar technology on thin buried oxide (UTB) for the 28nm and below technology nodes. In this framework, the key elements of the low power design platform in bulk planar technology have been studied. Based on this analysis, different architectures of FDSOI multi-VT MOSFETs have been developed. The analysis on the layout of elementary circuits, such as standard cells and SRAM cells, has put forward two reliable, efficient and low technological complexity multi- strategies. Finally, the performances of these solutions have been evaluated on a critical path extracted from the ARM Cortex A9 processor and a high-density 6T SRAM cell (0.120 m ). Also, an SRAM cell with four transistors has been proposed, highlighting the design flexibility brought by these solutions. This thesis has resulted in many publications, communications and patents. Today, the majority of the results obtained have been transferred to STMicroelectronics, where the industrialization is in progress.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

OpenGrey Repository

Performance Analysis of Complex Shared Memory Systems

Author: Molka Daniel
Publication venue
Publication date: 10/03/2017
Field of study

Systems for high performance computing are getting increasingly complex. On the one hand, the number of processors is increasing. On the other hand, the individual processors are getting more and more powerful. In recent years, the latter is to a large extent achieved by increasing the number of cores per processor. Unfortunately, scientific applications often fail to fully utilize the available computational performance. Therefore, performance analysis tools that help to localize and fix performance problems are indispensable. Large scale systems for high performance computing typically consist of multiple compute nodes that are connected via network. Performance analysis tools that analyze performance problems that arise from using multiple nodes are readily available. However, the increasing number of cores per processor that can be observed within the last decade represents a major change in the node architecture. Therefore, this work concentrates on the analysis of the node performance. The goal of this thesis is to improve the understanding of the achieved application performance on existing hardware. It can be observed that the scaling of parallel applications on multi-core processors differs significantly from the scaling on multiple processors. Therefore, the properties of shared resources in contemporary multi-core processors as well as remote accesses in multi-processor systems are investigated and their respective impact on the application performance is analyzed. As a first step, a comprehensive suite of highly optimized micro-benchmarks is developed. These benchmarks are able to determine the performance of memory accesses depending on the location and coherence state of the data. They are used to perform an in-depth analysis of the characteristics of memory accesses in contemporary multi-processor systems, which identifies potential bottlenecks. However, in order to localize performance problems, it also has to be determined to which extend the application performance is limited by certain resources. Therefore, a methodology to derive metrics for the utilization of individual components in the memory hierarchy as well as waiting times caused by memory accesses is developed in the second step. The approach is based on hardware performance counters, which record the number of certain hardware events. The developed micro-benchmarks are used to selectively stress individual components, which can be used to identify the events that provide a reasonable assessment for the utilization of the respective component and the amount of time that is spent waiting for memory accesses to complete. Finally, the knowledge gained from this process is used to implement a visualization of memory related performance issues in existing performance analysis tools. The results of the micro-benchmarks reveal that the increasing number of cores per processor and the usage of multiple processors per node leads to complex systems with vastly different performance characteristics of memory accesses depending on the location of the accessed data. Furthermore, it can be observed that the aggregated throughput of shared resources in multi-core processors does not necessarily scale linearly with the number of cores that access them concurrently, which limits the scalability of parallel applications. It is shown that the proposed methodology for the identification of meaningful hardware performance counters yields useful metrics for the localization of memory related performance limitations

Technische Universität Dresden: Qucosa

A Unified Infrastructure for Monitoring and Tuning the Energy Efficiency of HPC Applications

Author: Schöne Robert
Publication venue
Publication date: 19/09/2017
Field of study

High Performance Computing (HPC) has become an indispensable tool for the scientific community to perform simulations on models whose complexity would exceed the limits of a standard computer. An unfortunate trend concerning HPC systems is that their power consumption under high-demanding workloads increases. To counter this trend, hardware vendors have implemented power saving mechanisms in recent years, which has increased the variability in power demands of single nodes. These capabilities provide an opportunity to increase the energy efficiency of HPC applications. To utilize these hardware power saving mechanisms efficiently, their overhead must be analyzed. Furthermore, applications have to be examined for performance and energy efficiency issues, which can give hints for optimizations. This requires an infrastructure that is able to capture both, performance and power consumption information concurrently. The mechanisms that such an infrastructure would inherently support could further be used to implement a tool that is able to do both, measuring and tuning of energy efficiency. This thesis targets all steps in this process by making the following contributions: First, I provide a broad overview on different related fields. I list common performance measurement tools, power measurement infrastructures, hardware power saving capabilities, and tuning tools. Second, I lay out a model that can be used to define and describe energy efficiency tuning on program region scale. This model includes hardware and software dependent parameters. Hardware parameters include the runtime overhead and delay for switching power saving mechanisms as well as a contemplation of their scopes and the possible influence on application performance. Thus, in a third step, I present methods to evaluate common power saving mechanisms and list findings for different x86 processors. Software parameters include their performance and power consumption characteristics as well as the influence of power-saving mechanisms on these. To capture software parameters, an infrastructure for measuring performance and power consumption is necessary. With minor additions, the same infrastructure can later be used to tune software and hardware parameters. Thus, I lay out the structure for such an infrastructure and describe common components that are required for measuring and tuning. Based on that, I implement adequate interfaces that extend the functionality of contemporary performance measurement tools. Furthermore, I use these interfaces to conflate performance and power measurements and further process the gathered information for tuning. I conclude this work by demonstrating that the infrastructure can be used to manipulate power-saving mechanisms of contemporary x86 processors and increase the energy efficiency of HPC applications

Technische Universität Dresden: Qucosa

Adaptive Distributed Architectures for Future Semiconductor Technologies.

Author: Pellegrini Andrea
Publication venue
Publication date: 01/01/2013
Field of study

Year after year semiconductor manufacturing has been able to integrate more components in a single computer chip. These improvements have been possible through systematic shrinking in the size of its basic computational element, the transistor. This trend has allowed computers to progressively become faster, more efficient and less expensive. As this trend continues, experts foresee that current computer designs will face new challenges, in utilizing the minuscule devices made available by future semiconductor technologies. Today's microprocessor designs are not fit to overcome these challenges, since they are constrained by their inability to handle component failures by their lack of adaptability to a wide range of custom modules optimized for specific applications and by their limited design modularity. The focus of this thesis is to develop original computer architectures, that can not only survive these new challenges, but also leverage the vast number of transistors available to unlock better performance and efficiency. The work explores and evaluates new software and hardware techniques to enable the development of novel adaptive and modular computer designs. The thesis first explores an infrastructure to quantitatively assess the fallacies of current systems and their inadequacy to operate on unreliable silicon. In light of these findings, specific solutions are then proposed to strengthen digital system architectures, both through hardware and software techniques. The thesis culminates with the proposal of a radically new architecture design that can fully adapt dynamically to operate on the hardware resources available on chip, however limited or abundant those may be.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102405/1/apellegr_1.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Process Variation in Silicon Photonic Devices

Author: Chen Xi
Publication venue: University of Colorado Boulder
Publication date: 01/01/2013
Field of study

The high index contrast of the silicon - silicon dioxide material system allows for dense integration of optical waveguide devices. Possible applications include intra-chip, inter-chip and fiber optic interconnection systems. Optical intra-chip interconnections become more desirable as the complementary metal-oxide-semiconductor (CMOS) circuit density puts ever tighter constraint on on-chip interconnection performance. Board level, rack level and rack-to-rack data center interconnections are ever more constrained by space and bandwidth to which silicon photonic modules may offer an improvement. As fiber optic systems serve smaller and smaller area systems, integrated switching systems that are enabled by silicon photonic devices involving wavelength division multiplexing (WDM) become more desirable. In this thesis, we firstly take a brief review of the development history of information technology, optical communication and silicon photonics. Secondly we examine the optical performance of an array of photonic devices which are the basic building blocks for silicon photonic circuits. Thirdly we turn the attention to the fabrication related issues. Silicon photonic circuits are prone to the thermal and fabrication induced process variations. We discover the process variation exhibits a “random walk” pattern with spatial extent at wafer scale. Fourthly we propose a simple method to extract fundamental parameters out of fabricated silicon photonic devices. Based on the systemic wafer-scale measurement results, our method combines the advantage of both numerical simulation and simple analytical modeling techniques. Lastly, we propose a variation-aware on-chip interconnect design for multi-core processors. This design adapts to on-chip thermal and process variation effects, pointing to the improvement of wafer-scale fabrication yield and interconnect network communication throughput

CU Scholar Institutional Repository

Recommended from our members

Combined C-V/I-V and RTN CMOS Variability Characterization Using An On-Chip Measurement System

Author: Realov Simeon Dimitrov
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

With the number of transistors integrated into a single integrated circuit (IC) crossing the one-billion mark and complementary metal-oxide-semiconductor (CMOS) technology scaling pushing device dimensions ever-so-close to atomic scales, variability in transistor performance is becoming the dominant constraint in modern-day CMOS IC design. Developing novel approaches for device characterization, which allow a detailed study of electrical transistor characteristics across large statistical sample sets, is crucial for the proper identification, characterization, and modeling of different physical sources of device variability. On-chip characterization methodologies have the potential to address all of these issues by enabling the characterization of large statistical device sample sets, while also allowing for high measurement quality and throughput. In this work, a fully-integrated system for on-chip combined capacitance-voltage (C-V) and current-voltage (I-V) characterization of a large integrated test transistor array implemented in a 45-nm bulk CMOS process is presented. On-chip I-V characterization is implemented using a four-point Kelvin measurement technique with 12-bit sub-10 nA current measurement resolution, 10-bit sub-1 mV voltage measurement resolution, and sampling speeds on the order of 100 kHz. C-V characterization is performed using a novel leakage- and parasitics-insensitive charge-based capacitance measurement (CBCM) technique with atto-Farad resolution. The on-chip system is employed in developing a comprehensive CMOS transistor variability characterization methodology, studying both random and systematic sources of quasi-static device variability. For the first time, combined C-V/I-V characterization of circuit-representative devices is demonstrated and used to extract variations in the under- lying physical parameters of the device. Additionally, the fast current sampling capabilities of the system are used for the characterization of random telegraph noise (RTN) in small area devices. An automated methodology for the extraction of RTN parameters is developed, and the statistics of RTN are studied across device type, bias, and geometry

Columbia University Academic Commons