20 research outputs found

    Custom Cell Placement Automation for Asynchronous VLSI

    Get PDF
    Asynchronous Very-Large-Scale-Integration (VLSI) integrated circuits have demonstrated many advantages over their synchronous counterparts, including low power consumption, elastic pipelining, robustness against manufacturing and temperature variations, etc. However, the lack of dedicated electronic design automation (EDA) tools, especially physical layout automation tools, largely limits the adoption of asynchronous circuits. Existing commercial placement tools are optimized for synchronous circuits, and require a standard cell library provided by semiconductor foundries to complete the physical design. The physical layouts of cells in this library have the same height to simplify the placement problem and the power distribution network. Although the standard cell methodology also works for asynchronous designs, the performance is inferior compared with counterparts designed using the full-custom design methodology. To tackle this challenge, we propose a gridded cell layout methodology for asynchronous circuits, in which the cell height and cell width can be any integer multiple of two grid values. The gridded cell approach combines the shape regularity of standard cells with the size flexibility of full-custom layouts. Therefore, this approach can achieve a better space utilization ratio and lower wire length for asynchronous designs. Experiments have shown that the gridded cell placement approach reduces area without impacting the routability. We have also used this placer to tape out a chip in a 65nm process technology, demonstrating that our placer generates design-rule clean results

    Design of asynchronous microprocessor for power proportionality

    Get PDF
    PhD ThesisMicroprocessors continue to get exponentially cheaper for end users following Moore’s law, while the costs involved in their design keep growing, also at an exponential rate. The reason is the ever increasing complexity of processors, which modern EDA tools struggle to keep up with. This makes further scaling for performance subject to a high risk in the reliability of the system. To keep this risk low, yet improve the performance, CPU designers try to optimise various parts of the processor. Instruction Set Architecture (ISA) is a significant part of the whole processor design flow, whose optimal design for a particular combination of available hardware resources and software requirements is crucial for building processors with high performance and efficient energy utilisation. This is a challenging task involving a lot of heuristics and high-level design decisions. Another issue impacting CPU reliability is continuous scaling for power consumption. For the last decades CPU designers have been mainly focused on improving performance, but “keeping energy and power consumption in mind”. The consequence of this was a development of energy-efficient systems, where energy was considered as a resource whose consumption should be optimised. As CMOS technology was progressing, with feature size decreasing and power delivered to circuit components becoming less stable, the energy resource turned from an optimisation criterion into a constraint, sometimes a critical one. At this point power proportionality becomes one of the most important aspects in system design. Developing methods and techniques which will address the problem of designing a power-proportional microprocessor, capable to adapt to varying operating conditions (such as low or even unstable voltage levels) and application requirements in the runtime, is one of today’s grand challenges. In this thesis this challenge is addressed by proposing a new design flow for the development of an ISA for microprocessors, which can be altered to suit a particular hardware platform or a specific operating mode. This flow uses an expressive and powerful formalism for the specification of processor instruction sets called the Conditional Partial Order Graph (CPOG). The CPOG model captures large sets of behavioural scenarios for a microarchitectural level in a computationally efficient form amenable to formal transformations for synthesis, verification and automated derivation of asynchronous hardware for the CPU microcontrol. The feasibility of the methodology, novel design flow and a number of optimisation techniques was proven in a full size asynchronous Intel 8051 microprocessor and its demonstrator silicon. The chip showed the ability to work in a wide range of operating voltage and environmental conditions. Depending on application requirements and power budget our ASIC supports several operating modes: one optimised for energy consumption and the other one for performance. This was achieved by extending a traditional datapath structure with an auxiliary control layer for adaptable and fault tolerant operation. These and other optimisations resulted in a reconfigurable and adaptable implementation, which was proven by measurements, analysis and evaluation of the chip.EPSR

    A Modular Approach to Adaptive Reactive Streaming Systems

    Get PDF
    The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects

    Architectural Exploration of KeyRing Self-Timed Processors

    Get PDF
    RÉSUMÉ Les derniĂšres dĂ©cennies ont vu l’augmentation des performances des processeurs contraintes par les limites imposĂ©es par la consommation d’énergie des systĂšmes Ă©lectroniques : des trĂšs basses consommations requises pour les objets connectĂ©s, aux budgets de dĂ©penses Ă©lectriques des serveurs, en passant par les limitations thermiques et la durĂ©e de vie des batteries des appareils mobiles. Cette forte demande en processeurs efficients en Ă©nergie, couplĂ©e avec les limitations de la rĂ©duction d’échelle des transistors—qui ne permet plus d’amĂ©liorer les performances Ă  densitĂ© de puissance constante—, conduit les concepteurs de circuits intĂ©grĂ©s Ă  explorer de nouvelles microarchitectures permettant d’obtenir de meilleures performances pour un budget Ă©nergĂ©tique donnĂ©. Cette thĂšse s’inscrit dans cette tendance en proposant une nouvelle microarchitecture de processeur, appelĂ©e KeyRing, conçue avec l’intention de rĂ©duire la consommation d’énergie des processeurs. La frĂ©quence d’opĂ©ration des transistors dans les circuits intĂ©grĂ©s est proportionnelle Ă  leur consommation dynamique d’énergie. Par consĂ©quent, les techniques de conception permettant de rĂ©duire dynamiquement le nombre de transistors en opĂ©ration sont trĂšs largement adoptĂ©es pour amĂ©liorer l’efficience Ă©nergĂ©tique des processeurs. La technique de clock-gating est particuliĂšrement usitĂ©e dans les circuits synchrones, car elle rĂ©duit l’impact de l’horloge globale, qui est la principale source d’activitĂ©. La microarchitecture KeyRing prĂ©sentĂ©e dans cette thĂšse utilise une mĂ©thode de synchronisation dĂ©centralisĂ©e et asynchrone pour rĂ©duire l’activitĂ© des circuits. Elle est dĂ©rivĂ©e du processeur AnARM, un processeur dĂ©veloppĂ© par Octasic sur la base d’une microarchitecture asynchrone ad hoc. Bien qu’il soit plus efficient en Ă©nergie que des alternatives synchrones, le AnARM est essentiellement incompatible avec les mĂ©thodes de synthĂšse et d’analyse temporelle statique standards. De plus, sa technique de conception ad hoc ne s’inscrit que partiellement dans les paradigmes de conceptions asynchrones. Cette thĂšse propose une approche rigoureuse pour dĂ©finir les principes gĂ©nĂ©raux de cette technique de conception ad hoc, en faisant levier sur la littĂ©rature asynchrone. La microarchitecture KeyRing qui en rĂ©sulte est dĂ©veloppĂ©e en association avec une mĂ©thode de conception automatisĂ©e, qui permet de s’affranchir des incompatibilitĂ©s natives existant entre les outils de conception et les systĂšmes asynchrones. La mĂ©thode proposĂ©e permet de pleinement mettre Ă  profit les flots de conception standards de l’industrie microĂ©lectronique pour rĂ©aliser la synthĂšse et la vĂ©rification des circuits KeyRing. Cette thĂšse propose Ă©galement des protocoles expĂ©rimentaux, dont le but est de renforcer la relation de causalitĂ© entre la microarchitecture KeyRing et une rĂ©duction de la consommation Ă©nergĂ©tique des processeurs, comparativement Ă  des alternatives synchrones Ă©quivalentes.----------ABSTRACT Over the last years, microprocessors have had to increase their performances while keeping their power envelope within tight bounds, as dictated by the needs of various markets: from the ultra-low power requirements of the IoT, to the electrical power consumption budget in enterprise servers, by way of passive cooling and day-long battery life in mobile devices. This high demand for power-efficient processors, coupled with the limitations of technology scaling—which no longer provides improved performances at constant power densities—, is leading designers to explore new microarchitectures with the goal of pulling more performances out of a fixed power budget. This work enters into this trend by proposing a new processor microarchitecture, called KeyRing, having a low-power design intent. The switching activity of integrated circuits—i.e. transistors switching on and off—directly affects their dynamic power consumption. Circuit-level design techniques such as clock-gating are widely adopted as they dramatically reduce the impact of the global clock in synchronous circuits, which constitutes the main source of switching activity. The KeyRing microarchitecture presented in this work uses an asynchronous clocking scheme that relies on decentralized synchronization mechanisms to reduce the switching activity of circuits. It is derived from the AnARM, a power-efficient ARM processor developed by Octasic using an ad hoc asynchronous microarchitecture. Although it delivers better power-efficiency than synchronous alternatives, it is for the most part incompatible with standard timing-driven synthesis and Static Timing Analysis (STA). In addition, its design style does not fit well within the existing asynchronous design paradigms. This work lays the foundations for a more rigorous definition of this rather unorthodox design style, using circuits and methods coming from the asynchronous literature. The resulting KeyRing microarchitecture is developed in combination with Electronic Design Automation (EDA) methods that alleviate incompatibility issues related to ad hoc clocking, enabling timing-driven optimizations and verifications of KeyRing circuits using industry-standard design flows. In addition to bridging the gap with standard design practices, this work also proposes comprehensive experimental protocols that aims to strengthen the causal relation between the reported asynchronous microarchitecture and a reduced power consumption compared with synchronous alternatives. The main achievement of this work is a framework that enables the architectural exploration of circuits using the KeyRing microarchitecture

    Analyse und Erweiterung eines fehler-toleranten NoC fĂŒr SRAM-basierte FPGAs in Weltraumapplikationen

    Get PDF
    Data Processing Units for scientific space mission need to process ever higher volumes of data and perform ever complex calculations. But the performance of available space-qualified general purpose processors is just in the lower three digit megahertz range, which is already insufficient for some applications. As an alternative, suitable processing steps can be implemented in hardware on a space-qualified SRAM-based FPGA. However, suitable devices are susceptible against space radiation. At the Institute for Communication and Network Engineering a fault-tolerant, network-based communication architecture was developed, which enables the construction of processing chains on the basis of different processing modules within suitable SRAM-based FPGAs and allows the exchange of single processing modules during runtime, too. The communication architecture and its protocol shall isolate non SEU mitigated or just partial SEU mitigated modules affected by radiation-induced faults to prohibit the propagation of errors within the remaining System-on-Chip. In the context of an ESA study, this communication architecture was extended with further components and implemented in a representative hardware platform. Based on the acquired experiences during the study, this work analyses the actual fault-tolerance characteristics as well as weak points of this initial implementation. At appropriate locations, the communication architecture was extended with mechanisms for fault-detection and fault-differentiation as well as with a hardware-based monitoring solution. Both, the former measures and the extension of the employed hardware-platform with selective fault-injection capabilities for the emulation of radiation-induced faults within critical areas of a non SEU mitigated processing module, are used to evaluate the effects of radiation-induced faults within the communication architecture. By means of the gathered results, further measures to increase fast detection and isolation of faulty nodes are developed, selectively implemented and verified. In particular, the ability of the communication architecture to isolate network nodes without SEU mitigation could be significantly improved.Instrumentenrechner fĂŒr wissenschaftliche Weltraummissionen mĂŒssen ein immer höheres Datenvolumen verarbeiten und immer komplexere Berechnungen ausfĂŒhren. Die Performanz von verfĂŒgbaren qualifizierten Universalprozessoren liegt aber lediglich im unteren dreistelligen Megahertz-Bereich, was fĂŒr einige Anwendungen bereits nicht mehr ausreicht. Als Alternative bietet sich die Implementierung von entsprechend geeigneten Datenverarbeitungsschritten in Hardware auf einem qualifizierten SRAM-basierten FPGA an. Geeignete Bausteine sind jedoch empfindlich gegenĂŒber der Strahlungsumgebung im Weltraum. Am Institut fĂŒr Datentechnik und Kommunikationsnetze wurde eine fehlertolerante netzwerk-basierte Kommunikationsarchitektur entwickelt, die innerhalb eines geeigneten SRAM-basierten FPGAs Datenverarbeitungsmodule miteinander nach Bedarf zu Verarbeitungsketten verbindet, sowie den Austausch von einzelnen Modulen im Betrieb ermöglicht. Nicht oder nur partiell SEU mitigierte Module sollen bei strahlungsbedingten Fehlern im Modul durch das Protokoll und die Fehlererkennungsmechanismen der Kommunikationsarchitektur isoliert werden, um ein Ausbreiten des Fehlers im restlichen System-on-Chip zu verhindern. Im Kontext einer ESA Studie wurde diese Kommunikationsarchitektur um Komponenten erweitert und auf einer reprĂ€sentativen Hardwareplattform umgesetzt. Basierend auf den gesammelten Erfahrungen aus der Studie, wird in dieser Arbeit eine Analyse der tatsĂ€chlichen Fehlertoleranz-Eigenschaften sowie der Schwachstellen dieser ursprĂŒnglichen Implementierung durchgefĂŒhrt. Die Kommunikationsarchitektur wurde an geeigneten Stellen um Fehlerdetektierungs- und Fehlerunterscheidungsmöglichkeiten erweitert, sowie um eine hardwarebasierte Überwachung ergĂ€nzt. Sowohl diese Maßnahmen, als auch die Erweiterung der Hardwareplattform um gezielte Fehlerinjektions-Möglichkeiten zum Emulieren von strahlungsinduzierten Fehlern in kritischen Komponenten eines nicht SEU mitigierten Prozessierungsmoduls werden genutzt, um die tatsĂ€chlichen auftretenden Effekte in der Kommunikationsarchitektur zu evaluieren. Anhand der Ergebnisse werden weitere Verbesserungsmaßnahmen speziell zur schnellen Detektierung und Isolation von fehlerhaften Knoten erarbeitet, selektiv implementiert und verifiziert. Insbesondere die FĂ€higkeit, fehlerhafte, nicht SEU mitigierte Netzwerkknoten innerhalb der Kommunikationsarchitektur zu isolieren, konnte dabei deutlich verbessert werden

    Thermal-Aware Networked Many-Core Systems

    Get PDF
    Advancements in IC processing technology has led to the innovation and growth happening in the consumer electronics sector and the evolution of the IT infrastructure supporting this exponential growth. One of the most difficult obstacles to this growth is the removal of large amount of heatgenerated by the processing and communicating nodes on the system. The scaling down of technology and the increase in power density is posing a direct and consequential effect on the rise in temperature. This has resulted in the increase in cooling budgets, and affects both the life-time reliability and performance of the system. Hence, reducing on-chip temperatures has become a major design concern for modern microprocessors. This dissertation addresses the thermal challenges at different levels for both 2D planer and 3D stacked systems. It proposes a self-timed thermal monitoring strategy based on the liberal use of on-chip thermal sensors. This makes use of noise variation tolerant and leakage current based thermal sensing for monitoring purposes. In order to study thermal management issues from early design stages, accurate thermal modeling and analysis at design time is essential. In this regard, spatial temperature profile of the global Cu nanowire for on-chip interconnects has been analyzed. It presents a 3D thermal model of a multicore system in order to investigate the effects of hotspots and the placement of silicon die layers, on the thermal performance of a modern ip-chip package. For a 3D stacked system, the primary design goal is to maximise the performance within the given power and thermal envelopes. Hence, a thermally efficient routing strategy for 3D NoC-Bus hybrid architectures has been proposed to mitigate on-chip temperatures by herding most of the switching activity to the die which is closer to heat sink. Finally, an exploration of various thermal-aware placement approaches for both the 2D and 3D stacked systems has been presented. Various thermal models have been developed and thermal control metrics have been extracted. An efficient thermal-aware application mapping algorithm for a 2D NoC has been presented. It has been shown that the proposed mapping algorithm reduces the effective area reeling under high temperatures when compared to the state of the art.Siirretty Doriast
    corecore