4 research outputs found

    Développement d'architectures HW/SW tolérantes aux fautes et auto-calibrantes pour les technologies Intégrées 3D

    Get PDF
    Malgré les avantages de l'intégration 3D, le test, le rendement et la fiabilité des Through-Silicon-Vias (TSVs) restent parmi les plus grands défis pour les systèmes 3D à base de Réseaux-sur-Puce (Network-on-Chip - NoC). Dans cette thèse, une stratégie de test hors-ligne a été proposé pour les interconnections TSV des liens inter-die des NoCs 3D. Pour le TSV Interconnect Built-In Self-Test (TSV-IBIST) on propose une nouvelle stratégie pour générer des vecteurs de test qui permet la détection des fautes structuraux (open et short) et paramétriques (fautes de délaye). Des stratégies de correction des fautes transitoires et permanents sur les TSV sont aussi proposées aux plusieurs niveaux d'abstraction: data link et network. Au niveau data link, des techniques qui utilisent des codes de correction (ECC) et retransmission sont utilisées pour protégé les liens verticales. Des codes de correction sont aussi utilisés pour la protection au niveau network. Les défauts de fabrication ou vieillissement des TSVs sont réparé au niveau data link avec des stratégies à base de redondance et sérialisation. Dans le réseau, les liens inter-die défaillante ne sont pas utilisables et un algorithme de routage tolérant aux fautes est proposé. On peut implémenter des techniques de tolérance aux fautes sur plusieurs niveaux. Les résultats ont montré qu'une stratégie multi-level atteint des très hauts niveaux de fiabilité avec un cout plus bas. Malheureusement, il n'y as pas une solution unique et chaque stratégie a ses avantages et limitations. C'est très difficile d'évaluer tôt dans le design flow les couts et l'impact sur la performance. Donc, une méthodologie d'exploration de la résilience aux fautes est proposée pour les NoC 3D mesh.3D technology promises energy-efficient heterogeneous integrated systems, which may open the way to thousands cores chips. Silicon dies containing processing elements are stacked and connected by vertical wires called Through-Silicon-Vias. In 3D chips, interconnecting an increasing number of processing elements requires a scalable high-performance interconnect solution: the 3D Network-on-Chip. Despite the advantages of 3D integration, testing, reliability and yield remain the major challenges for 3D NoC-based systems. In this thesis, the TSV interconnect test issue is addressed by an off-line Interconnect Built-In Self-Test (IBIST) strategy that detects both structural (i.e. opens, shorts) and parametric faults (i.e. delays and delay due to crosstalk). The IBIST circuitry implements a novel algorithm based on the aggressor-victim scenario and alleviates limitations of existing strategies. The proposed Kth-aggressor fault (KAF) model assumes that the aggressors of a victim TSV are neighboring wires within a distance given by the aggressor order K. Using this model, TSV interconnect tests of inter-die 3D NoC links may be performed for different aggressor order, reducing test times and circuitry complexity. In 3D NoCs, TSV permanent and transient faults can be mitigated at different abstraction levels. In this thesis, several error resilience schemes are proposed at data link and network levels. For transient faults, 3D NoC links can be protected using error correction codes (ECC) and retransmission schemes using error detection (Automatic Retransmission Query) and correction codes (i.e. Hybrid error correction and retransmission).For transients along a source-destination path, ECC codes can be implemented at network level (i.e. Network-level Forward Error Correction). Data link solutions also include TSV repair schemes for faults due to fabrication processes (i.e. TSV-Spare-and-Replace and Configurable Serial Links) and aging (i.e. Interconnect Built-In Self-Repair and Adaptive Serialization) defects. At network-level, the faulty inter-die links of 3D mesh NoCs are repaired by implementing a TSV fault-tolerant routing algorithm. Although single-level solutions can achieve the desired yield / reliability targets, error mitigation can be realized by a combination of approaches at several abstraction levels. To this end, multi-level error resilience strategies have been proposed. Experimental results show that there are cases where this multi-layer strategy pays-off both in terms of cost and performance. Unfortunately, one-fits-all solution does not exist, as each strategy has its advantages and limitations. For system designers, it is very difficult to assess early in the design stages the costs and the impact on performance of error resilience. Therefore, an error resilience exploration (ERX) methodology is proposed for 3D NoCs.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    Design-for-Test and Test Optimization Techniques for TSV-based 3D Stacked ICs

    Get PDF
    <p>As integrated circuits (ICs) continue to scale to smaller dimensions, long interconnects</p><p>have become the dominant contributor to circuit delay and a significant component of</p><p>power consumption. In order to reduce the length of these interconnects, 3D integration</p><p>and 3D stacked ICs (3D SICs) are active areas of research in both academia and industry.</p><p>3D SICs not only have the potential to reduce average interconnect length and alleviate</p><p>many of the problems caused by long global interconnects, but they can offer greater design</p><p>flexibility over 2D ICs, significant reductions in power consumption and footprint in</p><p>an era of mobile applications, increased on-chip data bandwidth through delay reduction,</p><p>and improved heterogeneous integration.</p><p>Compared to 2D ICs, the manufacture and test of 3D ICs is significantly more complex.</p><p>Through-silicon vias (TSVs), which constitute the dense vertical interconnects in a</p><p>die stack, are a source of additional and unique defects not seen before in ICs. At the same</p><p>time, testing these TSVs, especially before die stacking, is recognized as a major challenge.</p><p>The testing of a 3D stack is constrained by limited test access, test pin availability,</p><p>power, and thermal constraints. Therefore, efficient and optimized test architectures are</p><p>needed to ensure that pre-bond, partial, and complete stack testing are not prohibitively</p><p>expensive.</p><p>Methods of testing TSVs prior to bonding continue to be a difficult problem due to test</p><p>access and testability issues. Although some built-in self-test (BIST) techniques have been</p><p>proposed, these techniques have numerous drawbacks that render them impractical. In this dissertation, a low-cost test architecture is introduced to enable pre-bond TSV test through</p><p>TSV probing. This has the benefit of not needing large analog test components on the die,</p><p>which is a significant drawback of many BIST architectures. Coupled with an optimization</p><p>method described in this dissertation to create parallel test groups for TSVs, test time for</p><p>pre-bond TSV tests can be significantly reduced. The pre-bond probing methodology is</p><p>expanded upon to allow for pre-bond scan test as well, to enable both pre-bond TSV and</p><p>structural test to bring pre-bond known-good-die (KGD) test under a single test paradigm.</p><p>The addition of boundary registers on functional TSV paths required for pre-bond</p><p>probing results in an increase in delay on inter-die functional paths. This cost of test</p><p>architecture insertion can be a significant drawback, especially considering that one benefit</p><p>of 3D integration is that critical paths can be partitioned between dies to reduce their delay.</p><p>This dissertation derives a retiming flow that is used to recover the additional delay added</p><p>to TSV paths by test cell insertion.</p><p>Reducing the cost of test for 3D-SICs is crucial considering that more tests are necessary</p><p>during 3D-SIC manufacturing. To reduce test cost, the test architecture and test</p><p>scheduling for the stack must be optimized to reduce test time across all necessary test</p><p>insertions. This dissertation examines three paradigms for 3D integration - hard dies, firm</p><p>dies, and soft dies, that give varying degrees of control over 2D test architectures on each</p><p>die while optimizing the 3D test architecture. Integer linear programming models are developed</p><p>to provide an optimal 3D test architecture and test schedule for the dies in the 3D</p><p>stack considering any or all post-bond test insertions. Results show that the ILP models</p><p>outperform other optimization methods across a range of 3D benchmark circuits.</p><p>In summary, this dissertation targets testing and design-for-test (DFT) of 3D SICs.</p><p>The proposed techniques enable pre-bond TSV and structural test while maintaining a</p><p>relatively low test cost. Future work will continue to enable testing of 3D SICs to move</p><p>industry closer to realizing the true potential of 3D integration.</p>Dissertatio

    Conception d'un micro-réseau intégré NOC tolérant les fautes multiples statiques et dynamiques

    Get PDF
    The quest for higher-performance and low-power consumption has driven the microelectronics' industry race towards aggressive technology scaling and multicore chip designs. In this many-core era, the Network-on-chip (NoCs) becomes the most promising solution for on-chip communication because of its performance scaling with the number of IPs integrated in the chip.Fault tolerance becomes mandatory as the CMOS technology continues shrinking down. The yield and the reliability are more and more affected by factors such as manufacturing defects, process variations, environment variations, cosmic radiations, and so on. As a result, the designs should be able to provide full functionality (e.g. critical systems), or at least allow degraded mode in a context of high failure rates. To accomplish this, the systems should be able to adapt to manufacturing and runtime failures.In this thesis, some techniques are proposed to improve the fault tolerance ability of NoC based circuits working in harsh environments. As previous works allow the handling of one type of fault at a time, we propose here a solution where different kinds of faults can be tolerated concurrently.Considering constraints such as area and power consumption, a fault tolerant adaptive routing algorithm was proposed, which can cope with transient, intermittent and permanent faults. Combined with some existing techniques, like flit retransmission and packet fragmentation, this approach allows tolerating numerous static and dynamic faults. Simulations results show that the proposed solution allows a high packet delivery success rate: for a 16x16 2D Mesh NoC, 97.68% in the presence of 384 simultaneous link faults, and 93.40% with the presence of 103 simultaneous router faults. This success rate is even higher when this algorithm is extended to NoCs with Tore topology. Another contribution of this thesis is the inclusion of a congestion management function in the proposed routing algorithm. For this purpose, we introduce a novel metric of congestion measurement named Flit Remain. The experimental results show that using this new congestion metric allows a reduction of the average latency of the Network on Chip from 2.5% to 16.1% when compared to the existing metrics.The combination of static and dynamic fault tolerant and adaptive routing and the congestion management offers a solution, which allows designing a NoC highly resilient.Les progrès dans les technologies à base de semi-conducteurs et la demande croissante de puissance de calcul poussent vers une intégration dans une même puce de plus en plus de processeurs intégrés. Par conséquent les réseaux sur puce remplacent progressivement les bus de communication, ceux-ci offrant plus de débit et permettant une mise à l'échelle simplifiée. Parallèlement, la réduction de la finesse de gravure entraine une augmentation de la sensibilité des circuits au processus de fabrication et à son environnement d'utilisation. Les défauts de fabrication et le taux de défaillances pendant la durée de vie du circuit augmentent lorsque l'on passe d'une technologie à une autre. Intégrer des techniques de tolérance aux fautes dans un circuit devient indispensable, en particulier pour les circuits évoluant dans un environnement très sensible (aérospatial, automobile, santé, ...). Nous présentons dans ce travail de thèse, des techniques permettant d'améliorer la tolérance aux fautes des micro-réseaux intégrés dans des circuits évoluant dans un environnement difficile. Le NoC doit ainsi être capable de s'affranchir de la présence de nombreuses fautes. Les travaux publiés jusqu'ici proposaient des solutions pour un seul type de faute. En considérant les contraintes de surface et de consommation du domaine de l'embarqué, nous avons proposé un algorithme de routage adaptatif tolérant à la fois les fautes intermittentes, transitoires et permanentes. En combinant et adaptant des techniques existantes de retransmission de flits, de fragmentation et de regroupement de paquet, notre approche permet de s'affranchir de nombreuses fautes statiques et dynamiques. Les très nombreuses simulations réalisées ont permis de montrer entre autre que, l'algorithme proposé permet d'atteindre un taux de livraison de paquets de 97,68% pour un NoC 16x16 en maille 2D en présence de 384 liens défectueux simultanés, et 93,40% lorsque 103 routeurs sont défaillants. Nous avons étendu l'algorithme aux topologies de type tore avec des résultats bien meilleurs.Une autre originalité de cette thèse est que nous avons inclus dans cet algorithme une fonction de gestion de la congestion. Pour cela nous avons défini une nouvelle métrique de mesure de la congestion (Flit Remain) plus pertinente que les métriques utilisées et publiées jusqu'ici. Les expériences ont montré que l'utilisation de cette métrique permet de réduire la latence (au niveau du pic de saturation) de 2,5 % à 16,1 %, selon le type de trafic généré, par rapport à la plus efficace des métriques existante. La combinaison du routage adaptatif tolérant les fautes statiques et dynamiques et la gestion de la congestion offrent une solution qui permet d'avoir un NoC et par extension un circuit beaucoup plus résilient

    Interconnect Built-In Self-Repair and Adaptive-Serialization (I-BIRAS) for 3D integrated systems

    No full text
    ISBN 978-1-4244-7724-1International audienceThe high defect rates of the TSV manufacturing processes lead to poor yield. Interconnect repair and serialization techniques were proposed to improve yield. In these papers the control of the repair and serialization circuitry are determined off-chip and are stored in one-time-programmable memories. In this work we present an Interconnect Built-in Self-Repair and Adaptive-Serialization approach (I-BIRAS), where interconnect repair and data serialization/deserialization is performed without external intervention (reducing cost of external equipment) and can be executed at any time (after fabrication and all along system life), thus coping with both fabrication and system-life defects
    corecore