12 research outputs found

    Développement d'architectures HW/SW tolérantes aux fautes et auto-calibrantes pour les technologies Intégrées 3D

    Get PDF
    Malgré les avantages de l'intégration 3D, le test, le rendement et la fiabilité des Through-Silicon-Vias (TSVs) restent parmi les plus grands défis pour les systèmes 3D à base de Réseaux-sur-Puce (Network-on-Chip - NoC). Dans cette thèse, une stratégie de test hors-ligne a été proposé pour les interconnections TSV des liens inter-die des NoCs 3D. Pour le TSV Interconnect Built-In Self-Test (TSV-IBIST) on propose une nouvelle stratégie pour générer des vecteurs de test qui permet la détection des fautes structuraux (open et short) et paramétriques (fautes de délaye). Des stratégies de correction des fautes transitoires et permanents sur les TSV sont aussi proposées aux plusieurs niveaux d'abstraction: data link et network. Au niveau data link, des techniques qui utilisent des codes de correction (ECC) et retransmission sont utilisées pour protégé les liens verticales. Des codes de correction sont aussi utilisés pour la protection au niveau network. Les défauts de fabrication ou vieillissement des TSVs sont réparé au niveau data link avec des stratégies à base de redondance et sérialisation. Dans le réseau, les liens inter-die défaillante ne sont pas utilisables et un algorithme de routage tolérant aux fautes est proposé. On peut implémenter des techniques de tolérance aux fautes sur plusieurs niveaux. Les résultats ont montré qu'une stratégie multi-level atteint des très hauts niveaux de fiabilité avec un cout plus bas. Malheureusement, il n'y as pas une solution unique et chaque stratégie a ses avantages et limitations. C'est très difficile d'évaluer tôt dans le design flow les couts et l'impact sur la performance. Donc, une méthodologie d'exploration de la résilience aux fautes est proposée pour les NoC 3D mesh.3D technology promises energy-efficient heterogeneous integrated systems, which may open the way to thousands cores chips. Silicon dies containing processing elements are stacked and connected by vertical wires called Through-Silicon-Vias. In 3D chips, interconnecting an increasing number of processing elements requires a scalable high-performance interconnect solution: the 3D Network-on-Chip. Despite the advantages of 3D integration, testing, reliability and yield remain the major challenges for 3D NoC-based systems. In this thesis, the TSV interconnect test issue is addressed by an off-line Interconnect Built-In Self-Test (IBIST) strategy that detects both structural (i.e. opens, shorts) and parametric faults (i.e. delays and delay due to crosstalk). The IBIST circuitry implements a novel algorithm based on the aggressor-victim scenario and alleviates limitations of existing strategies. The proposed Kth-aggressor fault (KAF) model assumes that the aggressors of a victim TSV are neighboring wires within a distance given by the aggressor order K. Using this model, TSV interconnect tests of inter-die 3D NoC links may be performed for different aggressor order, reducing test times and circuitry complexity. In 3D NoCs, TSV permanent and transient faults can be mitigated at different abstraction levels. In this thesis, several error resilience schemes are proposed at data link and network levels. For transient faults, 3D NoC links can be protected using error correction codes (ECC) and retransmission schemes using error detection (Automatic Retransmission Query) and correction codes (i.e. Hybrid error correction and retransmission).For transients along a source-destination path, ECC codes can be implemented at network level (i.e. Network-level Forward Error Correction). Data link solutions also include TSV repair schemes for faults due to fabrication processes (i.e. TSV-Spare-and-Replace and Configurable Serial Links) and aging (i.e. Interconnect Built-In Self-Repair and Adaptive Serialization) defects. At network-level, the faulty inter-die links of 3D mesh NoCs are repaired by implementing a TSV fault-tolerant routing algorithm. Although single-level solutions can achieve the desired yield / reliability targets, error mitigation can be realized by a combination of approaches at several abstraction levels. To this end, multi-level error resilience strategies have been proposed. Experimental results show that there are cases where this multi-layer strategy pays-off both in terms of cost and performance. Unfortunately, one-fits-all solution does not exist, as each strategy has its advantages and limitations. For system designers, it is very difficult to assess early in the design stages the costs and the impact on performance of error resilience. Therefore, an error resilience exploration (ERX) methodology is proposed for 3D NoCs.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

    Exploration of communication strategies for computation intensive Systems-On-Chip

    Get PDF

    On-Chip-Netzwerk-Architekturen fĂĽr eingebettete hierarchische Multiprozessoren

    Get PDF
    Ax J. On-Chip-Netzwerk-Architekturen für eingebettete hierarchische Multiprozessoren. Bielefeld: Universität Bielefeld; 2019.Das Ziel der vorliegenden Arbeit ist die Realisierung und Analyse einer skalierbaren Verbindungsstruktur für ein Multi-Prozessorsystem auf einem Chip (MPSoC). Durch die zunehmende Digitalisierung werden in immer mehr Geräten des täglichen Lebens und der Industrie mikroelektronische Systeme eingesetzt. Hierbei handelt es sich häufig um energiebeschränkte Systeme, die zusätzlich einen stetig steigenden Bedarf an Rechenleistung aufweisen. Ein Trend, diesen Bedarf zu decken ist die Integration von zunehmend mehr Prozessorkernen auf einem einzelnen Mikrochip. Many-Core-Systeme mit vielen hunderten bis tausenden ressourceneffizienten CPU-Kernen versprechen hierbei eine besonders hohe Energieeffizienz. Im Vergleich zu Systemen mit wenigen leistungsfähigen, jedoch auch komplexeren CPUs, wird bei Many-Cores die Rechenleistung durch massive Parallelität erzielt. In der AG Kognitronik und Sensorik der Universität Bielefeld wird dazu das CoreVA-MPSoC entwickelt. Um hunderte von CPUs auf einen Chip zu integrieren, verfügt das CoreVA-MPSoC über eine hierarchische Verbindungsstruktur. Diese besteht aus einem On-Chip-Netzwerk (NoC), welches eine Vielzahl von CPU-Cluster koppelt. In jedem CPU-Cluster sind mehrere ressourceneffiziente VLIW-Prozessorkerne über eine eng gekoppelte Bus-Struktur verbunden. Der Fokus dieser Arbeit ist die Entwicklung und Entwurfsraumexploration einer ressourceneffizienten NoC-Architektur für den Einsatz im CoreVA-MPSoC. Die Entwurfsraumexploration findet dazu auf verschiedenen Ebenen statt. Auf der Ebene der Verbindungsstruktur des NoCs werden verschiedene Topologien und Mechanismen der Flusskontrolle untersucht. Des Weiteren wird die Entwicklung und Analyse eines synchronen, mesochronen und asynchronen NoCs vorgestellt, um die Skalierbarkeit und Energieeffizienz dieser Methoden zu untersuchen. Eine weitere Ebene bildet die Schnittstelle zum Prozessorsystem bzw. CPU-Cluster, die einen maßgeblichen Einfluss auf die Softwareentwicklung und Gesamtperformanz des Systems hat. Auf Systemebene wird schließlich die Anbindung verschiedener Speicherarchitekturen an das NoC vorgestellt und deren Auswirkung auf Performanz und Energiebedarf analysiert. Ein abstraktes Modell des CoreVA-MPSoCs mit Fokus auf dem NoC erlaubt die Abschätzung von Fläche, Performanz und Energie des Systems, bzw. der Ausführung von Streaming-Anwendungen. Dieses Modell kann im CoreVA-MPSoC-Compiler für die automatische Abbildung von Anwendungen auf dem MPSoC eingesetzt werden. Zehn Streaming-Anwendungen, vorwiegend aus dem Bereich der Signal- und Bildverarbeitung, zeigen bei der Abbildung auf einem CoreVA-MPSoC mit 32 CPUs eine durchschnittliche Beschleunigung um den Faktor 24 gegenüber der Ausführung auf einer CPU. Ein CoreVA-MPSoC mit 64 CPUs und insgesamt 3MB Speicher besitzt bei einer prototypischen Implementierung in einer 28-nm-FD-SOI-Standardzellenbibliothek einen Flächenbedarf von 14,4mm2. Bei einer Taktfrequenz von 700MHz liegt die durchschnittliche Leistungsaufnahme bei 2W. Eine FPGA-basierte Emulation auf einem FPGA-Cluster aus Xilinx Virtex-5-FPGAs erlaubt zudem eine skalierbare Verifikation eines CoreVA-MPSoCs mit nahezu beliebig vielen CPUs

    Memory hierarchy and data communication in heterogeneous reconfigurable SoCs

    Get PDF
    The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented

    Embedded computing systems design: architectural and application perspectives

    Get PDF
    Questo elaborato affronta varie problematiche legate alla progettazione e all'implementazione dei moderni sistemi embedded di computing, ponendo in rilevo, e talvolta in contrapposizione, le sfide che emergono all'avanzare della tecnologia ed i requisiti che invece emergono a livello applicativo, derivanti dalle necessità degli utenti finali e dai trend di mercato. La discussione sarà articolata tenendo conto di due punti di vista: la progettazione hardware e la loro applicazione a livello di sistema. A livello hardware saranno affrontati nel dettaglio i problemi di interconnettività on-chip. Aspetto che riguarda la parallelizzazione del calcolo, ma anche l'integrazione di funzionalità eterogenee. Sarà quindi discussa un'architettura d'interconnessione denominata Network-on-Chip (NoC). La soluzione proposta è in grado di supportare funzionalità avanzate di networking direttamente in hardware, consentendo tuttavia di raggiungere sempre un compromesso ottimale tra prestazioni in termini di traffico e requisiti di implementazioni a seconda dell'applicazione specifica. Nella discussione di questa tematica, verrà posto l'accento sul problema della configurabilità dei blocchi che compongono una NoC. Quello della configurabilità, è un problema sempre più sentito nella progettazione dei sistemi complessi, nei quali si cerca di sviluppare delle funzionalità, anche molto evolute, ma che siano semplicemente riutilizzabili. A tale scopo sarà introdotta una nuova metodologia, denominata Metacoding che consiste nell'astrarre i problemi di configurabilità attraverso linguaggi di programmazione di alto livello. Sulla base del metacoding verrà anche proposto un flusso di design automatico in grado di semplificare la progettazione e la configurazione di una NoC da parte del designer di rete. Come anticipato, la discussione si sposterà poi a livello di sistema, per affrontare la progettazione di tali sistemi dal punto di vista applicativo, focalizzando l'attenzione in particolare sulle applicazioni di monitoraggio remoto. A tal riguardo saranno studiati nel dettaglio tutti gli aspetti che riguardano la progettazione di un sistema per il monitoraggio di pazienti affetti da scompenso cardiaco cronico. Si partirà dalla definizione dei requisiti, che, come spesso accade a questo livello, derivano principalmente dai bisogni dell'utente finale, nel nostro caso medici e pazienti. Verranno discusse le problematiche di acquisizione, elaborazione e gestione delle misure. Il sistema proposto introduce vari aspetti innovativi tra i quali il concetto di protocollo operativo e l'elevata interoperabilità offerta. In ultima analisi, verranno riportati i risultati relativi alla sperimentazione del sistema implementato. Infine, il tema del monitoraggio remoto sarà concluso con lo studio delle reti di distribuzione elettrica intelligenti: le Smart Grid, cercando di fare uno studio dello stato dell'arte del settore, proponendo un'architettura di Home Area Network (HAN) e suggerendone una possibile implementazione attraverso Commercial Off the Shelf (COTS)

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Improving the Scalability of High Performance Computer Systems

    Full text link
    Improving the performance of future computing systems will be based upon the ability of increasing the scalability of current technology. New paths need to be explored, as operating principles that were applied up to now are becoming irrelevant for upcoming computer architectures. It appears that scaling the number of cores, processors and nodes within an system represents the only feasible alternative to achieve Exascale performance. To accomplish this goal, we propose three novel techniques addressing different layers of computer systems. The Tightly Coupled Cluster technique significantly improves the communication for inter node communication within compute clusters. By improving the latency by an order of magnitude over existing solutions the cost of communication is considerably reduced. This enables to exploit fine grain parallelism within applications, thereby, extending the scalability considerably. The mechanism virtually moves the network interconnect into the processor, bypassing the latency of the I/O interface and rendering protocol conversions unnecessary. The technique is implemented entirely through firmware and kernel layer software utilizing off-the-shelf AMD processors. We present a proof-of-concept implementation and real world benchmarks to demonstrate the superior performance of our technique. In particular, our approach achieves a software-to-software communication latency of 240 ns between two remote compute nodes. The second part of the dissertation introduces a new framework for scalable Networks-on-Chip. A novel rapid prototyping methodology is proposed, that accelerates the design and implementation substantially. Due to its flexibility and modularity a large application space is covered ranging from Systems-on-chip, to high performance many-core processors. The Network-on-Chip compiler enables to generate complex networks in the form of synthesizable register transfer level code from an abstract design description. Our engine supports different target technologies including Field Programmable Gate Arrays and Application Specific Integrated Circuits. The framework enables to build large designs while minimizing development and verification efforts. Many topologies and routing algorithms are supported by partitioning the tasks into several layers and by the introduction of a protocol agnostic architecture. We provide a thorough evaluation of the design that shows excellent results regarding performance and scalability. The third part of the dissertation addresses the Processor-Memory Interface within computer architectures. The increasing compute power of many-core processors, leads to an equally growing demand for more memory bandwidth and capacity. Current processor designs exhibit physical limitations that restrict the scalability of main memory. To address this issue we propose a memory extension technique that attaches large amounts of DRAM memory to the processor via a low pin count interface using high speed serial transceivers. Our technique transparently integrates the extension memory into the system architecture by providing full cache coherency. Therefore, applications can utilize the memory extension by applying regular shared memory programming techniques. By supporting daisy chained memory extension devices and by introducing the asymmetric probing approach, the proposed mechanism ensures high scalability. We furthermore propose a DMA offloading technique to improve the performance of the processor memory interface. The design has been implemented in a Field Programmable Gate Array based prototype. Driver software and firmware modifications have been developed to bring up the prototype in a Linux based system. We show microbenchmarks that prove the feasibility of our design
    corecore