32 research outputs found

    Limits on Fundamental Limits to Computation

    Full text link
    An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of transistor densities in integrated circuits over the last fifty years. Such Moore scaling now requires increasingly heroic efforts, stimulating research in alternative hardware and stirring controversy. To help evaluate emerging technologies and enrich our understanding of integrated-circuit scaling, we review fundamental limits to computation: in manufacturing, energy, physical space, design and verification effort, and algorithms. To outline what is achievable in principle and in practice, we recall how some limits were circumvented, compare loose and tight limits. We also point out that engineering difficulties encountered by emerging technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    Low-Dimensional Materials for Disruptive Microwave Antennas Design

    Get PDF
    This chapter is devoted to a complete analysis of remarkable electromagnetic properties of nanomaterials suitable for antenna design miniaturization. After a review of state of the art mesoscopic scale modeling tools and characterization techniques in microwave domain, new approaches based on wideband material parameters identification (complex permittivity and conductivity) will be described from impedance equivalence formulation achievement by de-embedding techniques applicable in integrated technology or in free space. A focus on performances of 1D materials such as vertically aligned multi-wall carbon nanotube (VA-MWCNT) bundles, from theory to technology, will be presented as a disruptive demonstration for defense and civil applications as in radar systems

    Electron beam induced deposition (EBID) of carbon interface between carbon nanotube interconnect and metal electrode

    Get PDF
    Electron Beam Induced Deposition (EBID) is an emerging additive nanomanufacturing tool which enables growth of complex 3-D parts from a variety of materials with nanoscale resolution. Fundamentals of EBID and its application to making a robust, low-contact-resistance electromechanical junction between a Multiwall Carbon Nanotube (MWNT) and a metal electrode are investigated in this thesis research. MWNTs are promising candidates for next generation electrical and electronic devices, and one of the main challenges in MWNT utilization is a high intrinsic contact resistance of the MWNT-metal electrode junction interface. EBID of an amorphous carbon interface has previously been demonstrated to simultaneously lower the electrical contact resistance and to improve mechanical characteristics of the MWNT-electrode junction. In this work, factors contributing to the EBID formation of the carbon joint between a MWNT and an electrode are systematically explored via complimentary experimental and theoretical investigations. A comprehensive dynamic model of EBID using residual hydrocarbons as a precursor molecule is developed by coupling the precursor mass transport, electron transport and scattering, and surface deposition reaction. The model is validated by comparison with experiments and is used to identify different EBID growth regimes and the growth rates and shapes of EBID deposits for each regime. In addition, the impact of MWNT properties, the electron beam impingement location and energy on the EBID-made carbon joint between the MWNT and the metal electrode is critically evaluated. Lastly, the dominant factors contributing to the overall electrical resistance of the MWNT-based electrical interconnect and relative importance of the mechanical contact area of the EBID-made carbon joint to MWNT vs. that to the metal electrode are determined using carefully designed experiments.Ph.D.Committee Chair: Dr. Andrei G. Fedorov; Committee Member: Dr. Azad Naeemi; Committee Member: Dr. Suresh Sitaraman; Committee Member: Dr. Vladimir V. Tsukruk; Committee Member: Dr. Yogendra Josh

    Analysis of performance variation in 16nm FinFET FPGA devices

    Get PDF

    High-Speed and Low-Energy On-Chip Communication Circuits.

    Full text link
    Continuous technology scaling sharply reduces transistor delays, while fixed-length global wire delays have increased due to less wiring pitch with higher resistance and coupling capacitance. Due to this ever growing gap, long on-chip interconnects pose well-known latency, bandwidth, and energy challenges to high-performance VLSI systems. Repeaters effectively mitigate wire RC effects but do little to improve their energy costs. Moreover, the increased complexity and high level of integration requires higher wire densities, worsening crosstalk noise and power consumption of conventionally repeated interconnects. Such increasing concerns in global on-chip wires motivate circuits to improve wire performance and energy while reducing the number of repeaters. This work presents circuit techniques and investigation for high-performance and energy-efficient on-chip communication in the aspects of encoding, data compression, self-timed current injection, signal pre-emphasis, low-swing signaling, and technology mapping. The improved bus designs also consider the constraints of robust operation and performance/energy gains across process corners and design space. Measurement results from 5mm links on 65nm and 90nm prototype chips validate 2.5-3X improvement in energy-delay product.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/75800/1/jseo_1.pd

    차세대 반도체 배선을 위한 코발트 합금 자가형성 확산방지막 재료 설계 및 전기적 신뢰성에 대한 연구

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 재료공학부, 2022.2. 주영창.Recently, the resistance-capacitance (RC) delay of the Cu interconnects in metal 1 (M1) level has been increased rapidly due to the reduction of the interconnect linewidth along with the transistor scaling down, and the interconnect reliability becomes a severe issue again. In order to overcome interconnect performance problems and move forward to the next-generation interconnects system, study on low resistivity (ρo) and low electron mean free path (λ) metals was conducted. Generally, metals such as Cobalt (Co), Ruthenium (Ru), and Molybdenum (Mo) are mentioned as candidates for next-generation interconnect materials, and since they have a low ρo × λ value, it is expected that the influence of interface scatterings and surface scattering can be minimized. However, harsh operating environments such as high electric fields, critical Joule heating, and reduction of the pitch size are severely deteriorating the performance of electronic devices as well as device reliability. For example, since time dependent dielectric breakdown (TDDB) problems for next-generation interconnect system have been reported recently, it is necessary to study alternative barrier materials and processes to improve the interconnect reliability. Specifically, extrinsic dielectric breakdown due to penetration of Co metal ions in high electric fields has been reported as a reliability problem to be solved in Co interconnect systems. Therefore, there is a need for new material system design and research on a robust diffusion barrier that prevents metal ions from penetrating into the dielectric, thereby improving the reliability of Co interconnects. Moreover, in order to lower the resistance of the interconnect, it is necessary to develop an ultra-thin barrier. This is because even a barrier with good reliability characteristics will degrade chip performance if it takes up a lot of volume in the interconnect. The recommended thickness for a single diffusion barrier layer is currently reported to be less than 2.5 nm. As a result, it is essential to develop materials that comprehensively consider performance and reliability. In this study, we designed a Co alloy self-forming barrier (SFB) material that can make sure of low resistance and high reliability for Co interconnects, which is attracting attention as a next-generation interconnect system. The self-forming barrier methodology induces diffusion of an alloy dopant at the interface between the metal and the dielectric during the annealing process. And the diffused dopant reacts with the dielectric to form an ultra-thin diffusion barrier. Through this methodology, it is possible to improve reliability by preventing the movement of metal ions. First of all, material design rules were established to screen the appropriate alloy dopants and all CMOS-compatible metals were investigated. Dopant resistivity, intermetallic compound formation, solubility in Co, activity coefficient in Co, and oxidation tendency is considered as the criteria for the dopant to escape from the Co matrix and react at the Co/SiO2 interface. In addition, thermodynamic calculations were performed to predict which phases would be formed after the annealing process. Based on thermodynamic calculations, 5 dopant metals were selected, prioritized for self-forming behavior. And the self-forming material was finally selected through thin film and device analysis. We confirmed that Cr, Zn, and Mn out-diffused to the surface of the thin film structure using X-ray photoelectron spectroscopy (XPS) depth profile and investigated the chemical state of out-diffused dopants through the analysis of a binding energy. Cr shows the most ideal self-forming behavior with the SiO2 dielectric and reacted with oxygen to form a Cr2O3 barrier. In metal-insulator-semiconductor (MIS) structure, out-diffused Cr reacts with SiO2 at the interface and forms a self-formed single layer. It was confirmed that the thickness of the diffusion barrier layer is about 1.2 nm, which is an ultra-thin layer capable of minimizing the total effective resistance. Through voltage-ramping dielectric breakdown (VRDB) tests, Co-Cr alloy showed highest breakdown voltage (VBD) up to 200 % than pure Co. The effect of Cr doping concentration and heat treatment condition applicable to the interconnect process was confirmed. When Cr was doped less than 1 at%, the robust electrical reliability was exhibited. Also, it was found that a Cr2O3 interfacial layer was formed when annealing process was performed at 250 °C or higher for 30 minutes or longer. In other words, Co-Cr alloy is well suited for the interconnect process because current interconnect process temperature is below 400 °C. And when the film thickness was lowered from 150 nm to 20 nm, excellent VBD values were confirmed even at high Cr doping concentration (~7.5 at%). It seems that the amount of Cr present at the Co/SiO2 interface plays a very important role in improving the Cr oxide SFB quality. Physical modeling is necessary to understand the amount of Cr at the interface according to the interconnect volumes and the reliability of the Cr oxide self-forming barrier. TDDB lifetime test also performed and Co-Cr alloy interconnect shows a highly reliable diffusion barrier property of self-formed interfacial layer. The DFT analysis also confirmed that Cr2O3 is a very promising barrier material because it showed a higher energy barrier value than the TiN diffusion barrier currently being studied. A Co-based self-forming barrier was designed through thermodynamic calculations that take performance and reliability into account in interconnect material system. A Co interconnect system with an ultra-thin Cr2O3 diffusion barrier with excellent reliability is proposed. Through this design, it is expected that high-performance interconnects based on robust reliability in the advanced interconnect can be implemented in the near future.최근 반도체 소자 스케일링에 따른 배선 선폭 감소로 M0, M1영역에서의 metal 비저항이 급격히 증가하여 배선에서의 RC delay가 다시 한번 크게 문제가 되고 있다. 이를 해결하기 위해서 차세대 배선 시스템에서는 낮은 비저항과 electron mean free path (EMFP)을 가지는 물질 연구가 진행되었다. 대표적으로 Co, Ru, Mo와 같은 금속들이 차세대 배선 재료 후보로 언급되고 있으며 낮은 ρ0 × λ 값을 갖기 때문에 interface (surface) scattering과 grain boundary scattering 영향을 최소화할 수 있을 것으로 보고 있다. 하지만 가혹한 electrical field와 높은 Joule heating이 발생하는 동작 환경으로 인해 performance뿐만 아니라 소자 신뢰성이 더 열악한 상황에 놓여있다. 예를 들어 차세대 금속에 대한 time dependent dielectric breakdown (TDDB) 신뢰성 문제가 보고되고 있기 때문에 이를 보안할 확산방지막 물질 및 공정연구가 필요하다. 특히 높은 전기장에서 Co ion이 유전체로 침투하여 extrinsic dielectric breakdown 신뢰성 문제가 최근 보고되고 있다. 따라서 금속 이온이 유전체 내부로 침투하는 것을 방지하여, Co 배선의 신뢰성을 향상시킬 수 견고한 확산방지막 개발 및 새로운 배선 시스템 설계가 필요한 시점이다. 또한, 배선 저항을 낮추기 위해서는 매우 얇은 확산방지막 개발이 필요하다. 신뢰성이 좋은 확산방지막이라도 배선에서 많은 영역을 차지할 경우 전체 성능이 저하되기 때문이다. Cu 확산방지막으로 사용되고 있는 TaN 층은 2.5 nm 보다 얇을 경우 신뢰성이 급격히 나빠지므로 2.5 nm보다 얇은 두께의 견고한 확산방지막 개발이 필요하다. 본 연구는 차세대 반도체 배선 물질로 주목받고 있는 Co 금속에 대하여 저저항·고신뢰성을 확보할 수 있는 Co alloy 자가형성 확산방지막 (Co alloy self-forming barrier, SFB) 소재 디자인하였다. 자가형성 확산방지막 방법론은 열처리 과정에서 금속과 유전체 계면에서 도펀트가 확산하게 된다. 그리고 확산되니 도펀트는 얇은 확산방지막을 형성하는 방법론이다. 이 방법론을 통해 금속 이온의 이동을 방지하여 Co 배선 신뢰성을 향상시킬 수 있을 것으로 예상하였다. 우선, Co 합금상에서 적절한 도펀트를 찾기 위해서 CMOS 공정에 적용 가능한 금속들을 선별하였다. 도펀트 저항, 금속간 화합물 형성 여부, Co내 고용도, Co alloy에서의 활성계수, 산화도, Co/SiO2 계면에서의 안정상을 열역학적 계산을 통해서 물질 선정 기준으로 세웠다. 열역학적 계산을 기반으로 9개의 도펀트 금속이 선택되었으며, Co 합금 자가형성 확산방지막 기준에 따라서 우선 순위를 지정하였다. 그리고 최종적으로 박막과 소자 신뢰성 평가를 통해서 가장 적합한 자가형성 확산방지막 물질을 선정하였다. X-ray photoelectron spectroscopy (XPS) 분석을 이용하여 Cr, Zn, Mn이 박막 구조의 표면으로 외부 확산 여부를 확인하고 결합 에너지 분석을 통해 외부로 확산된 도펀트의 화학적 상태를 조사하였다. 분석 결과 Cr, Zn, Mn이 유전체 계면으로 확산되어 산소와 반응하여oxide/silicate 확산 방지막 (e.g. Cr2O3, Zn2SiO4, MnSiO3)을 형성한 것을 확인하였다. 그 중 Cr은 SiO2 유전체와 함께 가장 이상적인 자기 형성 거동을 나타내며 산소와 반응하여 Cr2O3 층을 형성하는 것을 확인하였다. MIS (Metal-Insulator-Semiconductor) 구조에서도 외부로 확산된 Cr은 계면에서 SiO2와 반응하여 Cr2O3 자가형성 확산방지막이 형성되었다. 확산방지층의 두께는 약 1.2nm로 전체 유효저항을 최소화할 수 있는 충분히 얇은 두께를 확보하였다. VRDB (Voltage-Ramping Dielectric Breakdown) 테스트를 통해 Co-Cr 합금은 순수 Co보다 최대 200% 높은 항복 전압 (breakdown voltage)을 보였다. 반도체 배선 공정에 적용할 수 있는 Cr 도핑 농도와 열처리 조건의 영향을 확인하였다. Cr이 1at% 미만으로 도핑되었을 때 우수한 전기적 신뢰성을 나타내었다. 또한, 250℃ 이상에서 30분 이상 열처리를 하였을 때 Cr2O3 계면층이 형성됨을 알 수 있었다. 즉, 현재 배선 공정 온도가 400°C 미만이기 때문에 Co-Cr 합금이 배선 공정에 적용 가능함을 확인하였다. TDDB 수명 테스트도 수행되었으며 Co-Cr 합금 배선은 자체 형성된 계면층의 매우 안정적인 확산 장벽 특성을 보여주었다. DFT 분석은 Cr2O3자가형성 확산방지막이 현재 연구되고 있는 TiN 확산 장벽보다 더 높은 에너지 장벽 값을 보여주기 때문에 매우 유망한 확산방지막임을 보여주었다. 본 연구는 반도채 배선 물질 시스템에서 성능과 신뢰성을 고려한 열역학적 계산을 통해 Co 기반 자가형성 확산방지막을 설계하였다. 실험 결과 신뢰성이 우수하고 아주 얇은 Cr2O3 확산방지막이 있는 Co-Cr 합금이 제안하였다. 물질 설계와 전기적 신뢰성 검증을 Co/Cr2O3/SiO2 물질 시스템을 제안하였고 앞으로의 다가올 차세대 배선에서 구현될 수 있을 것으로 기대된다.Abstract i Table of Contents v List of Tables ix List of Figures xii Chapter 1. Introduction 1 1.1. Scaling down of VLSI systems 1 1.2. Driving force of interconnect system evolution 7 1.3. Driving force of beyond Cu interconnects 11 1.4. Objective of the thesis 18 1.5. Organization of the thesis 21 Chapter 2. Theoretical Background 22 2.1. Evolution of interconnect systems 22 2.1.1. Cu/barrier/low-k interconnect system 22 2.1.2. Process developments for interconnect reliability 27 2.1.3. 3rd generation of interconnect system 31 2.2 Thermodynamic tools for Co self-forming barrier 42 2.2.1 Binary phase diagram 42 2.2.2 Ellingham diagram 42 2.2.3 Activity coefficient 43 2.3. Reliability of Interconnects 45 2.3.1. Current conduction mechanisms in dielectrics 45 2.3.2. Reliability test vehicles 50 2.3.3. Dielectric breakdown assessment 52 2.3.4. Dielectric breakdown mechanisms 55 2.3.5. Reliability test: VRDB and TDDB 56 2.3.6. Lifetime models 57 Chapter 3. Experimental Procedures 60 3.1. Thin film deposition 60 3.1.1. Substrate preparation 60 3.1.2. Oxidation 61 3.1.3. Co alloy deposition using DC magnetron sputtering 61 3.1.4. Annealing process 65 3.2. Thin film characterization 67 3.2.1. Sheet resistance 67 3.2.2. X-ray photoelectron spectroscopy (XPS) 68 3.3. Metal-Insulator-Semiconductor (MIS) device fabrication 70 3.3.1. Patterning using lift-off process 70 3.3.2. TDDB packaging 72 3.4. Reliability analysis 74 3.4.1. Electrical reliability analysis 74 3.4.2. Transmission electron microscopy (TEM) analysis 75 3.5. Computation 76 3.5.1 FactsageTM calculation 76 3.5.2. Density Functional Theory (DFT) calculation 77 Chapter 4. Co Alloy Design for Advanced Interconnects 78 4.1. Material design of Co alloy self-forming barrier 78 4.1.1. Rule of thumb of Co-X alloy 78 4.1.2. Co alloy phase 80 4.1.3. Out-diffusion stage 81 4.1.4. Reaction step with SiO2 dielectric 89 4.1.5. Comparison criteria 94 4.2. Comparison of Co alloy candidates 97 4.2.1. Thin film resistivity evaluation 97 4.2.2. Self-forming behavior using XPS depth profile analysis 102 4.2.3. MIS device reliability test 110 4.3 Summary 115 Chapter 5. Co-Cr Alloy Interconnect with Robust Self-Forming Barrier 117 5.1. Compatibility of Co-Cr alloy SFB process 117 5.1.1. Effect of Cr doping concentration 117 5.1.2. Annealing process condition optimization 119 5.2. Reliability of Co-Cr interconnects 122 5.2.1. VRDB quality test with Co-Cr alloys 122 5.2.2. Lifetime evaluation using TDDB method 141 5.2.3. Barrier mechanism using DFT 142 5.3. Summary 145 Chapter 6. Conclusion 148 6.1. Summary of results 148 6.2. Research perspectives 150 References 151 Abstract (In Korean) 166 Curriculum Vitae 169박

    Power-constrained aware and latency-aware microarchitectural optimizations in many-core processors

    Get PDF
    As the transistor budgets outpace the power envelope (the power-wall issue), new architectural and microarchitectural techniques are needed to improve, or at least maintain, the power efficiency of next-generation processors. Run-time adaptation, including core, cache and DVFS adaptations, has recently emerged as a promising area to keep the pace for acceptable power efficiency. However, none of the adaptation techniques proposed so far is able to provide good results when we consider the stringent power budgets that will be common in the next decades, so new techniques that attack the problem from several fronts using different specialized mechanisms are necessary. The combination of different power management mechanisms, however, bring extra levels of complexity, since other factors such as workload behavior and run-time conditions must also be considered to properly allocate power among cores and threads. To address the power issue, this thesis first proposes Chrysso, an integrated and scalable model-driven power management that quickly selects the best combination of adaptation methods out of different core and uncore micro-architecture adaptations, per-core DVFS, or any combination thereof. Chrysso can quickly search the adaptation space by making performance/power projections to identify Pareto-optimal configurations, effectively pruning the search space. Chrysso achieves 1.9x better chip performance over core-level gating for multi-programmed workloads, and 1.5x higher performance for multi-threaded workloads. Most existing power management schemes use a centralized approach to regulate power dissipation. Unfortunately, the complexity and overhead of centralized power management increases significantly with core count rendering it in-viable at fine-grain time slices. The work leverages a two-tier hierarchical power manager. This solution is highly scalable with low overhead on a tiled many-core architecture with shared LLC and per-tile DVFS at fine-grain time slices. The global power is first distributed across tiles using GPM and then within a tile (in parallel across all tiles). Additionally, this work also proposes DVFS and cache-aware thread migration (DCTM) to ensure optimum per-tile co-scheduling of compatible threads at runtime over the two-tier hierarchical power manager. DCTM outperforms existing solutions by up to 12% on adaptive many-core tile processor. With the advancements in the core micro-architectural techniques and technology scaling, the performance gap between the computational component and memory component is increasing significantly (the memory-wall issue). To bridge this gap, the architecture community is pushing forward towards multi-core architecture with on-die near-memory DRAM cache memory (faster than conventional DRAM). Gigascale DRAM Caches poses a problem of how to efficiently manage the tags. The Tags-in-DRAM designs aims at efficiently co-locate tags with data, but it still suffer from high latency especially in multi-way associativity. The thesis finally proposes Tag Cache mechanism, an on-chip distributed tag caching mechanism with limited space and latency overhead to bypass the tag read operation in multi-way DRAM Caches, thereby reducing hit latency. Each Tag Cache, stored in L2, stores tag information of the most recently used DRAM Cache ways. The Tag Cache is able to exploit temporal locality of the DRAM Cache, thereby contributing to on average 46% of the DRAM Cache hits.A mesura que el consum dels transistors supera el nivell de potència desitjable es necessiten noves tècniques arquitectòniques i microarquitectòniques per millorar, o almenys mantenir, l'eficiència energètica dels processadors de les pròximes generacions. L'adaptació en temps d'execució, tant de nuclis com de les cachés, així com també adaptacions DVFS són idees que han sorgit recentment que fan preveure que sigui un àrea prometedora per mantenir un ritme d'eficiència energètica acceptable. Tanmateix, cap de les tècniques d'adaptació proposades fins ara és capaç d'oferir bons resultats si tenim en compte les restriccions estrictes de potència que seran comuns a les pròximes dècades. És convenient definir noves tècniques que ataquin el problema des de diversos fronts utilitzant diferents mecanismes especialitzats. La combinació de diferents mecanismes de gestió d'energia porta aparellada nivells addicionals de complexitat, ja que altres factors com ara el comportament de la càrrega de treball així com condicions específiques de temps d'execució també han de ser considerats per assignar adequadament la potència entre els nuclis del sistema computador. Per tractar el tema de la potència, aquesta tesi proposa en primer lloc Chrysso, una administració d'energia integrada i escalable que selecciona ràpidament la millor combinació entre diferents adaptacions microarquitectòniques. Chrysso pot buscar ràpidament l'adaptació adequada al fer projeccions òptimes de rendiment i potència basades en configuracions de Pareto, permetent així reduir de manera efectiva l'espai de cerca. Chrysso arriba a un rendiment de 1,9 sobre tècniques convencionals d'inhibició de portes amb una càrrega d'aplicacions seqüencials; i un rendiment de 1,5 quan les aplicacions corresponen a programes parla·lels. La majoria dels sistemes de gestió d'energia existents utilitzen un enfocament centralitzat per regular la dissipació d'energia. Malauradament, la complexitat i el temps d'administració s'incrementen significativament amb una gran quantitat de nuclis. En aquest treball es defineix un gestor jeràrquic de potència basat en dos nivells. Aquesta solució és altament escalable amb baix cost operatiu en una arquitectura de múltiples nuclis integrats en clústers, amb memòria caché de darrer nivell compartida a nivell de cluster, i DVFS establert en intervals de temps de gra fi a nivell de clúster. La potència global es distribueix en primer lloc a través dels clústers utilitzant GPM i després es distribueix dins un clúster (en paral·lel si es consideren tots els clústers). A més, aquest treball també proposa DVFS i migració de fils conscient de la memòria caché (DCTM) que garanteix una òptima distribució de tasques entre els nuclis. DCTM supera les solucions existents fins a un 12%. Amb els avenços en la tecnologia i les tècniques de micro-arquitectura de nuclis, la diferència de rendiment entre el component computacional i la memòria està augmentant significativament. Per omplir aquest buit, s'està avançant cap a arquitectures de múltiples nuclis amb memòries caché integrades basades en DRAM. Aquestes memòries caché DRAM a gran escala plantegen el problema de com gestionar de forma eficaç les etiquetes. Els dissenys de cachés amb dades i etiquetes juntes són un primer pas, però encara pateixen per tenir una alta latència, especialment en cachés amb un grau alt d'associativitat. En aquesta tesi es proposa l'estudi d'una tècnica anomenada Tag Cache, un mecanisme distribuït d'emmagatzematge d'etiquetes, que redueix la latència de les operacions de lectura d'etiquetes en les memòries caché DRAM. Cada Tag Cache, que resideix a L2, emmagatzema la informació de les vies que s'han accedit recentment de les memòries caché DRAM. D'aquesta manera es pot aprofitar la localitat temporal d'una caché DRAM, fet que contribueix en promig en un 46% dels encerts en les caché DRAM
    corecore