1 research outputs found

    Algorithms and methodologies for interconnect reliability analysis of integrated circuits

    Get PDF
    The phenomenal progress of computing devices has been largely made possible by the sustained efforts of semiconductor industry in innovating techniques for extremely large-scale integration. Indeed, gigantically integrated circuits today contain multi-billion interconnects which enable the transistors to talk to each other -all in a space of few mm2. Such aggressively downscaled components (transistors and interconnects) silently suffer from increasing electric fields and impurities/defects during manufacturing. Compounded by the Gigahertz switching, the challenges of reliability and design integrity remains very much alive for chip designers, with Electro migration (EM) being the foremost interconnect reliability challenge. Traditionally, EM containment revolves around EM guidelines, generated at single-component level, whose non-compliance means that the component fails. Failure usually refers to deformation due to EM -manifested in form of resistance increase, which is unacceptable from circuit performance point of view. Subsequent aspects deal with correct-by-construct design of the chip followed by the signoff-verification of EM reliability. Interestingly, chip designs today have reached a dilemma point of reduced margin between the actual and reliably allowed current densities, versus, comparatively scarce system-failures. Consequently, this research is focused on improved algorithms and methodologies for interconnect reliability analysis enabling accurate and design-specific interpretation of EM events. In the first part, we present a new methodology for logic-IP (cell) internal EM verification: an inadequately attended area in the literature. Our SPICE-correlated model helps in evaluating the cell lifetime under any arbitrary reliability speciation, without generating additional data - unlike the traditional approaches. The model is apt for today's fab less eco-system, where there is a) increasing reuse of standard cells optimized for one market condition to another (e.g., wireless to automotive), as well as b) increasing 3rd party content on the chip requiring a rigorous sign-off. We present results from a 28nm production setup, demonstrating significant violations relaxation and flexibility to allow runtime level reliability retargeting. Subsequently, we focus on an important aspect of connecting the individual component-level failures to that of the system failure. We note that existing EM methodologies are based on serial reliability assumption, which deems the entire system to fail as soon as the first component in the system fails. With a highly redundant circuit topology, that of a clock grid, in perspective, we present algorithms for EM assessment, which allow us to incorporate and quantify the benefit from system redundancies. With the skew metric of clock-grid as a failure criterion, we demonstrate that unless such incorporations are done, chip lifetimes are underestimated by over 2x. This component-to-system reliability bridge is further extended through an extreme order statistics based approach, wherein, we demonstrate that system failures can be approximated by an asymptotic kth-component failure model, otherwise requiring costly Monte Carlo simulations. Using such approach, we can efficiently predict a system-criterion based time to failure within existing EDA frameworks. The last part of the research is related to incorporating the impact of global/local process variation on current densities as well as fundamental physical factors on EM. Through Hermite polynomial chaos based approach, we arrive at novel variations-aware current density models, which demonstrate significant margins (> 30 %) in EM lifetime when compared with the traditional worst case approach. The above research problems have been motivated by the decade-long work experience of the author dealing with reliability issues in industrial SoCs, first at Texas Instruments and later at Qualcomm.L'espectacular progr茅s dels dispositius de c脿lcul ha estat possible en gran part als esfor莽os de la ind煤stria dels semiconductors en proposar t猫cniques innovadores per circuits d'una alta escala d'integraci贸. Els circuits integrats contenen milers de milions d'interconnexions que permeten connectar transistors dins d'un espai de pocs mm2. Tots aquests components estan afectats per camps el猫ctrics, impureses i defectes durant la seva fabricaci贸. Degut a l鈥檃ctivitat a nivell de Gigahertzs, la fiabilitat i integritat s贸n reptes importants pels dissenyadors de xips, on la Electromigraci贸 (EM) 茅s un dels problemes m茅s importants. Tradicionalment, el control de la EM ha girat entorn a directrius a nivell de component. L'incompliment d鈥檃lguna de les directrius implica un alt risc de falla. Per falla s'ent茅n la degradaci贸 deguda a la EM, que es manifesta en forma d'augment de la resist猫ncia, la qual cosa 茅s inacceptable des del punt de vista del rendiment del circuit. Altres aspectes tenen a veure amb la correcta construcci贸 del xip i la verificaci贸 de fiabilitat abans d鈥檈nviar el xip a fabricar. Avui en dia, el disseny s鈥檈nfronta a dilemes importants a l鈥檋ora de definir els marges de fiabilitat dels xips. 脡s un comprom铆s entre efici猫ncia i fiabilitat. La recerca en aquesta tesi se centra en la proposta d鈥檃lgorismes i metodologies per a l'an脿lisi de la fiabilitat d'interconnexi贸 que permeten una interpretaci贸 precisa i espec铆fica d'esdeveniments d'EM. A la primera part de la tesi es presenta una nova metodologia pel disseny correcte-per-construcci贸 i verificaci贸 d鈥橢M a l鈥檌nterior de les cel路les l貌giques. Es presenta un model SPICE correlat que ajuda a avaluar el temps de vida de les cel路les segons qualsevol especificaci贸 arbitr脿ria de fiabilitat i sense generar cap dada addicional, al contrari del que fan altres t猫cniques. El model 茅s apte per l'ecosistema d'empreses de disseny quan hi ha a) una reutilitzaci贸 creixent de cel路les est脿ndard optimitzades per unes condicions de mercat i utilitzades en un altre (p.ex. de wireless a automoci贸), o b) la utilitzaci贸 de components del xip provinents de terceres parts i que necessiten una verificaci贸 rigorosa. Es presenten resultats en una tecnologia de 28nm, demostrant relaxacions significatives de les regles de fiabilitat i flexibilitat per permetre la reavaluaci贸 de la fiabilitat en temps d'execuci贸. A continuaci贸, el treball tracta un aspecte important sobre la relaci贸 entre les falles dels components i les falles del sistema. S'observa que les t猫cniques existents es basen en la suposici贸 de fiabilitat en s猫rie, que porta el sistema a fallar tant aviat hi ha un component que falla. Pensant en topologies redundants, com la de les graelles de rellotge, es proposen algorismes per l'an脿lisi d'EM que permeten quantificar els beneficis de la redund脿ncia en el sistema. Utilitzant com a m猫trica l鈥檈sbiaixi del senyal de rellotge, es demostra que la vida dels xips pot arribar a ser infravalorada per un factor de 2x. Aquest pont de fiabilitat entre component i sistema es perfecciona a trav茅s d'una t猫cnica basada en estad铆stics d'ordre extrem on es demostra que les falles poden ser aproximades amb un model asimpt貌tic de fallada de l'i猫ssim component, evitant aix铆 simulacions de Monte Carlo costoses. Amb aquesta t猫cnica, es pot predir eficientment el temps de fallada a nivell de sistema utilitzant eines industrials. La darrera part de la recerca est脿 relacionada amb avaluar l'impacte de les variacions de proc茅s en les densitats de corrent i factors f铆sics de la EM. Mitjan莽ant una t猫cnica basada en polinomis d'Hermite s'han obtingut uns nous models de densitat de corrent que mostren millores importants (>30%) en l'estimaci贸 de la vida del sistema comprades amb les t猫cniques basades en el cas pitjor. La recerca d'aquesta tesi ha estat motivada pel treball de l'autor durant m茅s d'una d猫cada tractant temes de fiabilitat en sistemes, primer a Texas Instruments i despr茅s a Qualcomm.Postprint (published version
    corecore