43 research outputs found
Algorithms and methodologies for interconnect reliability analysis of integrated circuits
The phenomenal progress of computing devices has been largely made possible by the sustained efforts of semiconductor industry in innovating techniques for extremely large-scale integration. Indeed, gigantically integrated circuits today contain multi-billion interconnects which enable the transistors to talk to each other -all in a space of few mm2. Such aggressively downscaled components (transistors and interconnects) silently suffer from increasing electric fields and impurities/defects during manufacturing. Compounded by the Gigahertz switching, the challenges of reliability and design integrity remains very much alive for chip designers, with Electro migration (EM) being the foremost interconnect reliability challenge.
Traditionally, EM containment revolves around EM guidelines, generated at single-component level, whose non-compliance means that the component fails. Failure usually refers to deformation due to EM -manifested in form of resistance increase, which is unacceptable from circuit performance point of view. Subsequent aspects deal with correct-by-construct design of the chip followed by the signoff-verification of EM reliability. Interestingly, chip designs today have reached a dilemma point of reduced margin between the actual and reliably allowed current densities, versus, comparatively scarce system-failures. Consequently, this research is focused on improved algorithms and methodologies for interconnect reliability analysis enabling accurate and design-specific interpretation of EM events.
In the first part, we present a new methodology for logic-IP (cell) internal EM verification: an inadequately attended area in the literature. Our SPICE-correlated model helps in evaluating the cell lifetime under any arbitrary reliability speciation, without generating additional data - unlike the traditional approaches. The model is apt for today's fab less eco-system, where there is a) increasing reuse of standard cells optimized for one market condition to another (e.g., wireless to automotive), as well as b) increasing 3rd party content on the chip requiring a rigorous sign-off. We present results from a 28nm production setup, demonstrating significant violations relaxation and flexibility to allow runtime level reliability retargeting.
Subsequently, we focus on an important aspect of connecting the individual component-level failures to that of the system failure. We note that existing EM methodologies are based on serial reliability assumption, which deems the entire system to fail as soon as the first component in the system fails. With a highly redundant circuit topology, that of a clock grid, in perspective, we present algorithms for EM assessment, which allow us to incorporate and quantify the benefit from system redundancies. With the skew metric of clock-grid as a failure criterion, we demonstrate that unless such incorporations are done, chip lifetimes are underestimated by over 2x.
This component-to-system reliability bridge is further extended through an extreme order statistics based approach, wherein, we demonstrate that system failures can be approximated by an asymptotic kth-component failure model, otherwise requiring costly Monte Carlo simulations. Using such approach, we can efficiently predict a system-criterion based time to failure within existing EDA frameworks.
The last part of the research is related to incorporating the impact of global/local process variation on current densities as well as fundamental physical factors on EM. Through Hermite polynomial chaos based approach, we arrive at novel variations-aware current density models, which demonstrate significant margins (> 30 %) in EM lifetime when compared with the traditional worst case approach. The above research problems have been motivated by the decade-long work experience of the author dealing with reliability issues in industrial SoCs, first at Texas Instruments and later at Qualcomm.L'espectacular progrés dels dispositius de cà lcul ha estat possible en gran part als esforços de la indústria dels semiconductors en proposar tècniques innovadores per circuits d'una alta escala d'integració. Els circuits integrats contenen milers de milions d'interconnexions que permeten connectar transistors dins d'un espai de pocs mm2. Tots aquests components estan afectats per camps elèctrics, impureses i defectes durant la seva fabricació. Degut a l’activitat a nivell de Gigahertzs, la fiabilitat i integritat són reptes importants pels dissenyadors de xips, on la Electromigració (EM) és un dels problemes més importants. Tradicionalment, el control de la EM ha girat entorn a directrius a nivell de component. L'incompliment d’alguna de les directrius implica un alt risc de falla. Per falla s'entén la degradació deguda a la EM, que es manifesta en forma d'augment de la resistència, la qual cosa és inacceptable des del punt de vista del rendiment del circuit. Altres aspectes tenen a veure amb la correcta construcció del xip i la verificació de fiabilitat abans d’enviar el xip a fabricar. Avui en dia, el disseny s’enfronta a dilemes importants a l’hora de definir els marges de fiabilitat dels xips. És un compromÃs entre eficiència i fiabilitat. La recerca en aquesta tesi se centra en la proposta d’algorismes i metodologies per a l'anà lisi de la fiabilitat d'interconnexió que permeten una interpretació precisa i especÃfica d'esdeveniments d'EM. A la primera part de la tesi es presenta una nova metodologia pel disseny correcte-per-construcció i verificació d’EM a l’interior de les cel·les lògiques. Es presenta un model SPICE correlat que ajuda a avaluar el temps de vida de les cel·les segons qualsevol especificació arbitrà ria de fiabilitat i sense generar cap dada addicional, al contrari del que fan altres tècniques. El model és apte per l'ecosistema d'empreses de disseny quan hi ha a) una reutilització creixent de cel·les està ndard optimitzades per unes condicions de mercat i utilitzades en un altre (p.ex. de wireless a automoció), o b) la utilització de components del xip provinents de terceres parts i que necessiten una verificació rigorosa. Es presenten resultats en una tecnologia de 28nm, demostrant relaxacions significatives de les regles de fiabilitat i flexibilitat per permetre la reavaluació de la fiabilitat en temps d'execució. A continuació, el treball tracta un aspecte important sobre la relació entre les falles dels components i les falles del sistema. S'observa que les tècniques existents es basen en la suposició de fiabilitat en sèrie, que porta el sistema a fallar tant aviat hi ha un component que falla. Pensant en topologies redundants, com la de les graelles de rellotge, es proposen algorismes per l'anà lisi d'EM que permeten quantificar els beneficis de la redundà ncia en el sistema. Utilitzant com a mètrica l’esbiaixi del senyal de rellotge, es demostra que la vida dels xips pot arribar a ser infravalorada per un factor de 2x. Aquest pont de fiabilitat entre component i sistema es perfecciona a través d'una tècnica basada en estadÃstics d'ordre extrem on es demostra que les falles poden ser aproximades amb un model asimptòtic de fallada de l'ièssim component, evitant aixà simulacions de Monte Carlo costoses. Amb aquesta tècnica, es pot predir eficientment el temps de fallada a nivell de sistema utilitzant eines industrials. La darrera part de la recerca està relacionada amb avaluar l'impacte de les variacions de procés en les densitats de corrent i factors fÃsics de la EM. Mitjançant una tècnica basada en polinomis d'Hermite s'han obtingut uns nous models de densitat de corrent que mostren millores importants (>30%) en l'estimació de la vida del sistema comprades amb les tècniques basades en el cas pitjor. La recerca d'aquesta tesi ha estat motivada pel treball de l'autor durant més d'una dècada tractant temes de fiabilitat en sistemes, primer a Texas Instruments i després a Qualcomm.Postprint (published version
Recommended from our members
Physics-Based Electromigration Modeling and Analysis and Optimization
Long-term reliability is a major concern in modern VLSI design. Literature has shown that reliability gets worse as technology advances. It is expected that the future VLSI systems would have shorter reliability-induced lifetime comparing with previous generations. Being one of the most serious reliability effects, electromigration (EM) is a physical phenomenon of the migration of metal atoms due to the momentum exchange between atoms and the conducting electrons. It can cause wire resistance change or open circuit and result in functional failure of the circuit. Power-ground networks are the most vulnerable part to EM effect among all the interconnect wires since the current flow on this part is the largest on the chip. With new generation oftechnology node and aggressive design strategies, more accurate and efficient EM models are required. However, traditional EM approaches are very conservative and cannot meet current aggressive design strategies. Besides circuit level, EM also need to be thoroughly studied in system level due to limited power and temperature budgets among cores on chip. This research focuses on developing physical level EM model for VLSI circuits and system level EM optimization for multi-core systems in order to overcome the aforementioned problems. Specifically, for physical level, we develop two EM immortality check methods and a power grid EM check method. Firstly, a voltage based EM immortality analysis has been developed. Immortality condition in nucleation phase can be determined fast and accurately for multi-segment interconnect wires. Secondly, a saturation volume based incubation phase immortality check method has been proposed. This method can further reduce the redundancy in VLSI circuit design by immortality check in multiphase. Furthermore, both immortality check methods are integrated into a new power grid EM check methodology (EMspice) as filter for EM analysis. These filters can accelerate the simulation by filtering out immortal trees so that we only need to do simulation on fewer trees that are mortal. Coupled EM simulation considering both hydrostatic stress and electronic current/voltage in the power grid network will be applied to these mortal trees. This tool can work seamlessly with commercial synthesis flow. Besides physical level reliability models, system level reliability optimization is also discussed in this research. A deep reinforcement learning based EM optimization has been proposed for multi-core system. Both long term reliability effect (hard error) and transient soft error are considered. Energy can be optimized with all the reliability and other constraints fast and accurately compared to existing reliability management techniques. Last but not least, a scheduling based reliability optimization method for multi-core systems has been proposed. NBTI, HCI and EM are considered jointly. Lifetime of the system can be improved significantly compared to traditional methods which mainly focus on utilization
Recommended from our members
Electromigration modeling and layout optimization for advanced VLSI
textElectromigration (EM) is a critical problem for interconnect reliability in advanced VLSI design. Because EM is a strong function of current density, a smaller cross-sectional area of interconnects can degrade the EM-related lifetime of IC, which is expected to become more severe in future technology nodes. Moreover, as EM is governed by various factors such as temperature, material property, geometrical shape, and mechanical stress, different interconnect structures can have distinct EM issues and solutions to mitigate them. For example, one of the most prominent technologies, die stacking technology of three-dimensional (3D) ICs, can have different EM problems from that of planer ICs, due to their unique interconnects such as through-silicon vias (TSVs).
This dissertation investigates EM in various interconnect structures, and applies the EM models to optimize IC layout. First, modeling of EM is developed for chip-level interconnects, such as wires, local vias, TSVs, and multi-scale vias (MSVs). Based on the models, fast and accurate EM-prediction methods are proposed for the chip-level designs. After that, by utilizing the EM-prediction methods, the layout optimization methods are suggested, such as EM-aware routing for 3D ICs and EM-aware redundant via insertion for the future technology nodes in VLSI.
Experimental results show that the proposed EM modeling approaches enable fast and accurate EM evaluation for chip design, and the EM-aware layout optimization methods improve EM-robustness of advanced VLSI designs.Electrical and Computer Engineerin
The impact of design techniques in the reduction of power consumption of SoCs Multimedia
Orientador: Guido Costa Souza de AraújoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A indústria de semicondutores sempre enfrentou fortes demandas em resolver problema de dissipação de calor e reduzir o consumo de energia em dispositivos. Esta tendência tem sido intensificada nos últimos anos com o movimento de sustentabilidade ambiental. A concepção correta de um sistema eletrônico de baixo consumo de energia é um problema de vários nÃveis de complexidade e exige estratégias sistemáticas na sua construção. Fora disso, a adoção de qualquer técnica de redução de energia sempre está vinculada com objetivos especiais e provoca alguns impactos no projeto. Apesar dos projetistas conheçam bem os impactos de forma qualitativa, as detalhes quantitativas ainda são incógnitas ou apenas mantidas dentro do 'know-how' das empresas. Neste trabalho, de acordo com resultados experimentais baseado num plataforma de SoC1 industrial, tentamos quantificar os impactos derivados do uso de técnicas de redução de consumo de energia. Nos concentramos em relacionar o fator de redução de energia de cada técnica aos impactos em termo de área, desempenho, esforço de implementação e verificação. Na ausência desse tipo de dados, que relacionam o esforço de engenharia com as metas de consumo de energia, incertezas e atrasos serão frequentes no cronograma de projeto. Esperamos que este tipo de orientações possam ajudar/guiar os arquitetos de projeto em selecionar as técnicas adequadas para reduzir o consumo de energia dentro do alcance de orçamento e cronograma de projetoAbstract: The semiconductor industry has always faced strong demands to solve the problem of heat dissipation and reduce the power consumption in electronic devices. This trend has been increased in recent years with the action of environmental sustainability. The correct conception of an electronic system for low power consumption is an issue with multiple levels of complexities and requires systematic approaches in its construction. However, the adoption of any technique for reducing the power consumption is always linked with some specific goals and causes some impacts on the project. Although the designers know well that these impacts can affect the design in a quality aspect, the quantitative details are still unkown or just be kept inside the company's know-how. In this work, according to the experimental results based on an industrial SoC2 platform, we try to quantify the impacts of the use of low power techniques. We will relate the power reduction factor of each technique to the impact in terms of area, performance, implementation and verification effort. In the absence of such data, which relates the engineering effort to the goals of power consumption, uncertainties and delays are frequent. We hope that such guidelines can help/guide the project architects in selecting the appropriate techniques to reduce the power consumption within the limit of budget and project scheduleMestradoCiência da ComputaçãoMestre em Ciência da Computaçã
Recommended from our members
Interconnect Aging—Physics to Software
Device reliability or lifetime is often non-negotiable and crucial for sensitive applications such as medical devices, autonomous vehicles and space crafts. Inevitable technology advancement (e.g. miniaturization) has added unwelcome complications and unpredictability to the aging problem. Reliability of VLSI chips is jeopardized by mass transport in metallic interconnects. Material migration is caused by electrical, mechanical and thermal phenomena, and, therefore, is a complicated process. While all aspects of material migration have been studied, a comprehensive investigation that can explain and include all those phenomena simultaneously remains unsolved. Inaccuracies in modeling and predicting aging processes in wires cause that chipmakers often overdesign interconnects. This is an undesirable and expensive approach in terms of time and cost. In modern technologies, the predicted lifetime, aging, and failure mechanisms in interconnect very often do not match the observed behaviors. Unrealistic models used in CAD tools are the main culprit of such incompatibilities. In general, two situations may occur: (1) in some cases, the models may wrongly scrutinize reliability in unfailing parts and consequently impose unnecessary design tightening and (2) in some other cases, the models may underestimate serious reliability problems causing unpredicted behaviors or catastrophic failures to occur. The existing models for reliability evaluation are usually pessimistic in case of interconnect voiding and optimistic when extrusions occur. Time-consuming and not converging reliability assessments, as well as undesired chip behaviors, are the common expensive outcome of such models.We revisit the underlying physics of aging processes in dual-damascene copper lines. We demonstrate, that the simplistic modeling is the cause of the incompatibility of the existing models. We study all three main aging processes: electromigration, thermo-migration, and stress migration and offer several comprehensive yet compact models for realistic assessment of interconnect aging. These models explain many observations that have been inexplicable for decades. Ultimately, a computer-aided design tool, RAIN, is developed based on the proposed models and is capable of assessing the reliability of industry standard complex multi-layer, multi-segment interconnect networks. This tool can be readily integrated into other verification signoffs phases such as performance, timing, and power analyses. RAIN takes as inputs: (1) interconnect design, (2) technology specifications, (3) initial stress and temperature, (4) IR drop and lifetime requirements. It analyzes and assesses reliability and delivery requirements of all nets, and provides a report on voltage limitations, thermal violations and expected lifetime. It is validated on a wide spectrum of experimental results performed on various industry benchmarks
Recommended from our members
Simulation for Reliability, Hardware Security, and Ising Computing in VLSI Chip Design
The continued scaling of VLSI circuits has provided a wealth of opportunities andchallenges to the VLSI circuit design area. Both these challenges and opportunities, however,require new simulation tools that can enable their solution or exploitation as classicalmethods typically dealt with problem domains with smaller scales or less complexity. Inthis dissertation, simulation methods are presented to address the emerging VLSI designtopics of Electromigration induced aging and Ising computing and are then applied to theapplication areas of hardware security and graph partitioning respectively.The Electromigration aging effect in VLSI circuits is a long-term reliability issueaffecting current carrying metal wires leading to IR drop degradation. Typically, simpleanalytical equations can determine a wire’s effective age or if it will be affected by the EMaging effect at all. However, these classical methods are overly conservative and can lead toover design or unnecessary design iterations. Furthermore, it is expected that the EM agingeffect will become more severe in future Integrated Cirucits (ICs) due to increasing currentdensities and the prevalance of polycrystaline copper atom structures seen at small wiredimensions. For this reason, more comprehensive simulation techniques that can efficientlysimulate the EM effect with less conservative results can help mitigate overdesign andincrease design margins while reducing design iterations.The area of Hardware Security is becoming increasingly important as the chipsupply chain becomes more globalized and the integrity of chips becomes more diffiuclt toverify. Utilizing the accurate simulation techniques for EM, we can utilize this reliabilityeffect to demonstrate how a reliability based attack could be perpatrated. Furthermore, wecan utilize this aging effect as a defense mechanism to help us validate the integrity of anIC and detect counterfeit chips in the component supply chain market.Ising computing is an emerging method of solving combinatorial optimization problemsby simulating the interactions of so-called spin glasses and their interactions. Borrowingconcepts from quantum computing, this methods mimics the quantum interaction betweenspin glasses in such a way that finding a ground state of these spin glass models leadsto the solution of a particular problem. In this dissertation, effective methods of simulatingthe spin glass interactions using General Purpose Graphics Processing Units (GPGPUs)and finding their ground state are developed.In addition to the GPU based Ising model simulations, important combinatorialproblems can be mapped to the Ising model. In this dissertation the Ising solver is appliedto graph partitioning which can be utilized in VLSI design and many other domains as well.Specifically, solvers for the maxcut problem and the balanced min-cut partitioning problemare developed
Dependable Embedded Systems
This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems
Physical parameter-aware Networks-on-Chip design
PhD ThesisNetworks-on-Chip (NoCs) have been proposed as a scalable, reliable
and power-efficient communication fabric for chip multiprocessors
(CMPs) and multiprocessor systems-on-chip (MPSoCs). NoCs determine
both the performance and the reliability of such systems, with a
significant power demand that is expected to increase due to developments
in both technology and architecture. In terms of architecture, an
important trend in many-core systems architecture is to increase the
number of cores on a chip while reducing their individual complexity.
This trend increases communication power relative to computation
power. Moreover, technology-wise, power-hungry wires are dominating
logic as power consumers as technology scales down. For these
reasons, the design of future very large scale integration (VLSI) systems
is moving from being computation-centric to communication-centric.
On the other hand, chip’s physical parameters integrity, especially
power and thermal integrity, is crucial for reliable VLSI systems. However,
guaranteeing this integrity is becoming increasingly difficult with
the higher scale of integration due to increased power density and operating
frequencies that result in continuously increasing temperature
and voltage drops in the chip. This is a challenge that may prevent
further shrinking of devices. Thus, tackling the challenge of power
and thermal integrity of future many-core systems at only one level
of abstraction, the chip and package design for example, is no longer
sufficient to ensure the integrity of physical parameters. New designtime
and run-time strategies may need to work together at different
levels of abstraction, such as package, application, network, to provide
the required physical parameter integrity for these large systems. This
necessitates strategies that work at the level of the on-chip network
with its rising power budget.
This thesis proposes models, techniques and architectures to improve
power and thermal integrity of Network-on-Chip (NoC)-based
many-core systems. The thesis is composed of two major parts: i)
minimization and modelling of power supply variations to improve
power integrity; and ii) dynamic thermal adaptation to improve thermal
integrity. This thesis makes four major contributions. The first is
a computational model of on-chip power supply variations in NoCs.
The proposed model embeds a power delivery model, an NoC activity
simulator and a power model. The model is verified with SPICE simulation
and employed to analyse power supply variations in synthetic
and real NoC workloads. Novel observations regarding power supply
noise correlation with different traffic patterns and routing algorithms
are found. The second is a new application mapping strategy aiming
vii
to minimize power supply noise in NoCs. This is achieved by defining
a new metric, switching activity density, and employing a force-based
objective function that results in minimizing switching density. Significant
reductions in power supply noise (PSN) are achieved with a low
energy penalty. This reduction in PSN also results in a better link timing
accuracy. The third contribution is a new dynamic thermal-adaptive
routing strategy to effectively diffuse heat from the NoC-based threedimensional
(3D) CMPs, using a dynamic programming (DP)-based distributed
control architecture. Moreover, a new approach for efficient extension
of two-dimensional (2D) partially-adaptive routing algorithms
to 3D is presented. This approach improves three-dimensional networkon-
chip (3D NoC) routing adaptivity while ensuring deadlock-freeness.
Finally, the proposed thermal-adaptive routing is implemented in
field-programmable gate array (FPGA), and implementation challenges,
for both thermal sensing and the dynamic control architecture are addressed.
The proposed routing implementation is evaluated in terms
of both functionality and performance.
The methodologies and architectures proposed in this thesis open a
new direction for improving the power and thermal integrity of future
NoC-based 2D and 3D many-core architectures
Internet of Things. Information Processing in an Increasingly Connected World
This open access book constitutes the refereed post-conference proceedings of the First IFIP International Cross-Domain Conference on Internet of Things, IFIPIoT 2018, held at the 24th IFIP World Computer Congress, WCC 2018, in Poznan, Poland, in September 2018. The 12 full papers presented were carefully reviewed and selected from 24 submissions. Also included in this volume are 4 WCC 2018 plenary contributions, an invited talk and a position paper from the IFIP domain committee on IoT. The papers cover a wide range of topics from a technology to a business perspective and include among others hardware, software and management aspects, process innovation, privacy, power consumption, architecture, applications