63 research outputs found

    Statistical circuit simulations - from ‘atomistic’ compact models to statistical standard cell characterisation

    Get PDF
    This thesis describes the development and application of statistical circuit simulation methodologies to analyse digital circuits subject to intrinsic parameter fluctuations. The specific nature of intrinsic parameter fluctuations are discussed, and we explain the crucial importance to the semiconductor industry of developing design tools which accurately account for their effects. Current work in the area is reviewed, and three important factors are made clear: any statistical circuit simulation methodology must be based on physically correct, predictive models of device variability; the statistical compact models describing device operation must be characterised for accurate transient analysis of circuits; analysis must be carried out on realistic circuit components. Improving on previous efforts in the field, we posit a statistical circuit simulation methodology which accounts for all three of these factors. The established 3-D Glasgow atomistic simulator is employed to predict electrical characteristics for devices aimed at digital circuit applications, with gate lengths from 35 nm to 13 nm. Using these electrical characteristics, extraction of BSIM4 compact models is carried out and their accuracy in performing transient analysis using SPICE is validated against well characterised mixed-mode TCAD simulation results for 35 nm devices. Static d.c. simulations are performed to test the methodology, and a useful analytic model to predict hard logic fault limitations on CMOS supply voltage scaling is derived as part of this work. Using our toolset, the effect of statistical variability introduced by random discrete dopants on the dynamic behaviour of inverters is studied in detail. As devices scaled, dynamic noise margin variation of an inverter is increased and higher output load or input slew rate improves the noise margins and its variation. Intrinsic delay variation based on CV/I delay metric is also compared using ION and IEFF definitions where the best estimate is obtained when considering ION and input transition time variations. Critical delay distribution of a path is also investigated where it is shown non-Gaussian. Finally, the impact of the cell input slew rate definition on the accuracy of the inverter cell timing characterisation in NLDM format is investigated

    A software controlled voltage tuning system using multi-purpose ring oscillators

    Full text link
    This paper presents a novel software driven voltage tuning method that utilises multi-purpose Ring Oscillators (ROs) to provide process variation and environment sensitive energy reductions. The proposed technique enables voltage tuning based on the observed frequency of the ROs, taken as a representation of the device speed and used to estimate a safe minimum operating voltage at a given core frequency. A conservative linear relationship between RO frequency and silicon speed is used to approximate the critical path of the processor. Using a multi-purpose RO not specifically implemented for critical path characterisation is a unique approach to voltage tuning. The parameters governing the relationship between RO and silicon speed are obtained through the testing of a sample of processors from different wafer regions. These parameters can then be used on all devices of that model. The tuning method and software control framework is demonstrated on a sample of XMOS XS1-U8A-64 embedded microprocessors, yielding a dynamic power saving of up to 25% with no performance reduction and no negative impact on the real-time constraints of the embedded software running on the processor

    Miniaturized Transistors, Volume II

    Get PDF
    In this book, we aim to address the ever-advancing progress in microelectronic device scaling. Complementary Metal-Oxide-Semiconductor (CMOS) devices continue to endure miniaturization, irrespective of the seeming physical limitations, helped by advancing fabrication techniques. We observe that miniaturization does not always refer to the latest technology node for digital transistors. Rather, by applying novel materials and device geometries, a significant reduction in the size of microelectronic devices for a broad set of applications can be achieved. The achievements made in the scaling of devices for applications beyond digital logic (e.g., high power, optoelectronics, and sensors) are taking the forefront in microelectronic miniaturization. Furthermore, all these achievements are assisted by improvements in the simulation and modeling of the involved materials and device structures. In particular, process and device technology computer-aided design (TCAD) has become indispensable in the design cycle of novel devices and technologies. It is our sincere hope that the results provided in this Special Issue prove useful to scientists and engineers who find themselves at the forefront of this rapidly evolving and broadening field. Now, more than ever, it is essential to look for solutions to find the next disrupting technologies which will allow for transistor miniaturization well beyond silicon’s physical limits and the current state-of-the-art. This requires a broad attack, including studies of novel and innovative designs as well as emerging materials which are becoming more application-specific than ever before

    Firmware Development of a LoRaWAN Multi-Sensor Generic Node : an Industrial IoT Empirical Study

    Get PDF
    Connectivity is the defining property of the Internet of Things (IoT). Multiple technologies and techniques allow embedded devices to transmit and receive data. The intersection between connectivity demands, physical and environmental limitations is what triggers the use of a specific technology. Low Power Wide Area (LPWA) wireless communication technologies such as LoRa and LoRaWAN are showing practicality and ease of the use in the field of IoT. LoRa modulation ability to provide devices with longer communication ranges make it an attractive choice in multiple IoT use cases. The energy efficiency and scalability aspects of LoRaWAN protocol trigger the research curiosity around the challenges and opportunities of using the technology. In this study, we provide an extensive overview of the firmware development of an industrial LoRaWAN device. An empirical analysis of the device capabilities provides a deeper understanding of the technology potential and the possible areas of improvements. Energy-efficient firmware design practices are explored, analysed and implemented to provide a foundation for future developments in the field. Furthermore, we propose and evaluate a new LoRaWAN application design that explores over the air firmware configuration in runtime a novel micro update scheme. We devise a simple LoRaWAN energy estimation model and apply it to the proposed application. The same model is used to get an indication of LoRaWAN firmware updates over the air (FUOTA) practicality. The applied model highlighted useful energy optimisation techniques for an improved LoRaWAN firmware development

    Circuit Optimisation using Device Layout Motifs

    Get PDF
    Circuit designers face great challenges as CMOS devices continue to scale to nano dimensions, in particular, stochastic variability caused by the physical properties of transistors. Stochastic variability is an undesired and uncertain component caused by fundamental phenomena associated with device structure evolution, which cannot be avoided during the manufacturing process. In order to examine the problem of variability at atomic levels, the 'Motif' concept, defined as a set of repeating patterns of fundamental geometrical forms used as design units, is proposed to capture the presence of statistical variability and improve the device/circuit layout regularity. A set of 3D motifs with stochastic variability are investigated and performed by technology computer aided design simulations. The statistical motifs compact model is used to bridge between device technology and circuit design. The statistical variability information is transferred into motifs' compact model in order to facilitate variation-aware circuit designs. The uniform motif compact model extraction is performed by a novel two-step evolutionary algorithm. The proposed extraction method overcomes the drawbacks of conventional extraction methods of poor convergence without good initial conditions and the difficulty of simulating multi-objective optimisations. After uniform motif compact models are obtained, the statistical variability information is injected into these compact models to generate the final motif statistical variability model. The thesis also considers the influence of different choices of motif for each device on circuit performance and its statistical variability characteristics. A set of basic logic gates is constructed using different motif choices. Results show that circuit performance and variability mitigation can benefit from specific motif permutations. A multi-stage optimisation methodology is introduced, in which the processes of optimisation are divided into several stages. Benchmark circuits show the efficacy of the proposed methods. The results presented in this thesis indicate that the proposed methods are able to provide circuit performance improvements and are able to create circuits that are more robust against variability

    Combining dynamic and static scheduling in high-level synthesis

    Get PDF
    Field Programmable Gate Arrays (FPGAs) are starting to become mainstream devices for custom computing, particularly deployed in data centres. However, using these FPGA devices requires familiarity with digital design at a low abstraction level. In order to enable software engineers without a hardware background to design custom hardware, high-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C++, into a low-level hardware description. A central task in HLS is scheduling: the allocation of operations to clock cycles. The classic approach to scheduling is static, in which each operation is mapped to a clock cycle at compile time, but recent years have seen the emergence of dynamic scheduling, in which an operation’s clock cycle is only determined at run-time. Both approaches have their merits: static scheduling can lead to simpler circuitry and more resource sharing, while dynamic scheduling can lead to faster hardware when the computation has a non-trivial control flow. This thesis proposes a scheduling approach that combines the best of both worlds. My idea is to use existing program analysis techniques in software designs, such as probabilistic analysis and formal verification, to optimize the HLS hardware. First, this thesis proposes a tool named DASS that uses a heuristic-based approach to identify the code regions in the input program that are amenable to static scheduling and synthesises them into statically scheduled components, also known as static islands, leaving the top-level hardware dynamically scheduled. Second, this thesis addresses a problem of this approach: that the analysis of static islands and their dynamically scheduled surroundings are separate, where one treats the other as black boxes. We apply static analysis including dependence analysis between static islands and their dynamically scheduled surroundings to optimize the offsets of static islands for high performance. We also apply probabilistic analysis to estimate the performance of the dynamically scheduled part and use this information to optimize the static islands for high area efficiency. Finally, this thesis addresses the problem of conservatism in using sequential control flow designs which can limit the throughput of the hardware. We show this challenge can be solved by formally proving that certain control flows can be safely parallelised for high performance. This thesis demonstrates how to use automated formal verification to find out-of-order loop pipelining solutions and multi-threading solutions from a sequential program.Open Acces

    Separation logic for high-level synthesis

    Get PDF
    High-level synthesis (HLS) promises a significant shortening of the digital hardware design cycle by raising the abstraction level of the design entry to high-level languages such as C/C++. However, applications using dynamic, pointer-based data structures remain difficult to implement well, yet such constructs are widely used in software. Automated optimisations that leverage the memory bandwidth of dedicated hardware implementations by distributing the application data over separate on-chip memories and parallelise the implementation are often ineffective in the presence of dynamic data structures, due to the lack of an automated analysis that disambiguates pointer-based memory accesses. This thesis takes a step towards closing this gap. We explore recent advances in separation logic, a rigorous mathematical framework that enables formal reasoning about the memory access of heap-manipulating programs. We develop a static analysis that automatically splits heap-allocated data structures into provably disjoint regions. Our algorithm focuses on dynamic data structures accessed in loops and is accompanied by automated source-to-source transformations which enable loop parallelisation and physical memory partitioning by off-the-shelf HLS tools. We then extend the scope of our technique to pointer-based memory-intensive implementations that require access to an off-chip memory. The extended HLS design aid generates parallel on-chip multi-cache architectures. It uses the disjointness property of memory accesses to support non-overlapping memory regions by private caches. It also identifies regions which are shared after parallelisation and which are supported by parallel caches with a coherency mechanism and synchronisation, resulting in automatically specialised memory systems. We show up to 15x acceleration from heap partitioning, parallelisation and the insertion of the custom cache system in demonstrably practical applications.Open Acces

    Optimising algorithm and hardware for deep neural networks on FPGAs

    Get PDF
    This thesis proposes novel algorithm and hardware optimisation approaches to accelerate Deep Neural Networks (DNNs), including both Convolutional Neural Networks (CNNs) and Bayesian Neural Networks (BayesNNs). The first contribution of this thesis is to propose an adaptable and reconfigurable hardware design to accelerate CNNs. By analysing the computational patterns of different CNNs, a unified hardware architecture is proposed for both 2-Dimension and 3-Dimension CNNs. The accelerator is also designed with runtime adaptability, which adopts different parallelism strategies for different convolutional layers at runtime. The second contribution of this thesis is to propose a novel neural network architecture and hardware design co-optimisation approach, which improves the performance of CNNs at both algorithm and hardware levels. Our proposed three-phase co-design framework decouples network training from design space exploration, which significantly reduces the time-cost of the co-optimisation process. The third contribution of this thesis is to propose an algorithmic and hardware co-optimisation framework for accelerating BayesNNs. At the algorithmic level, three categories of structured sparsity are explored to reduce the computational complexity of BayesNNs. At the hardware level, we propose a novel hardware architecture with the aim of exploiting the structured sparsity for BayesNNs. Both algorithmic and hardware optimisations are jointly applied to push the performance limit.Open Acces

    Démonstration de l'intérêt des dispositifs multi-grilles auto-alignées pour les noeuds sub-10nm.

    Get PDF
    Les nombreuses modifications de la structure du transistor bulk ont permis de poursuivre la miniaturisation jusqu'à sa limite aux nœuds 32/28nm. Les technologies actuelles répondent au besoin d'un meilleur contrôle électrostatique en s'ouvrant vers l'industrialisation de transistors complètement dépletés, avec les architectures sur film mince (FDSOI) ou non planaires (TriGate FinFET bulk). Dans ce dernier cas, le substrat bulk reste limitant pour des applications à basse consommation. La combinaison de la technologie SOI et d'une architecture non-planaire conduit aux transistors TriGate sur SOI (ou TGSOI). Nous verrons l'intérêt de ces dispositifs et démontrerons qu'ils sont compatibles avec les techniques de contrainte. On montrera en particulier les améliorations de mobilité et de courants obtenus sur ces dispositifs de largeur inférieure à 15nm. Des simulations montrent également qu'un dispositif TGSOI peut être compatible avec les techniques de modulation de VT. Enfin, nous démontrons la possibilité de fabriquer des dispositifs ultimes à nanofils empilés avec une grille enrobante par une technique innovante de lithographie tridimensionnelle. La conception, la caractérisation physique et les premiers résultats électriques obtenus seront présentés. Ces solutions peuvent répondre aux besoins des nœuds sub-10nm.Changing the bulk transistor structure was sufficient so far to fulfill the scaling needs. The current technologies answer the needs of electrostatics control with the industrialization of fully depleted transistors, with thin-film (FDSOI) or non-planar (TriGate FinFet bulk) technologies. In the latter, bulk substrate is still an issue for low power applications. Combining SOI with multiple-gate structure gives rise to TriGate on SOI (or TGSOI). We will discuss the interest of such devices and will demonstrate their compatibility with strain techniques. We will focus on the mobility and current enhancement obtained on sub-15nm width devices. Simulations also demonstrate the compatibility of TGSOI with VT modulation technique. Finally, we demonstrate the fabrication through 3D lithography of ultimate stacked nanowires with a gate-all-around. The conception, physical characterization and first electrical results are presented.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF
    • …
    corecore