111 research outputs found
Elastic bundles :modelling and architecting asynchronous circuits with granular rigidity
PhD ThesisIntegrated Circuit (IC) designs these days are predominantly System-on-Chips (SoCs).
The complexity of designing a SoC has increased rapidly over the years due to growing
process and environmental variations coupled with global clock distribution di culty.
Moreover, traditional synchronous design is not apt to handle the heterogeneous timing
nature of modern SoCs. As a countermeasure, the semiconductor industry witnessed
a strong revival of asynchronous design principles. A new paradigm of digital circuits
emerged, as a result, namely mixed synchronous-asynchronous circuits. With a wave
of recent innovations in synchronous-asynchronous CAD integration, this paradigm is
showing signs of commercial adoption in future SoCs mainly due to the scope for reuse
of synchronous functional blocks and IP cores, and the co-existence of synchronous and
asynchronous design styles in a common EDA framework.
However, there is a lack of formal methods and tools to facilitate mixed synchronousasynchronous
design. In this thesis, we propose a formal model based on Petri nets with
step semantics to describe these circuits behaviourally. Implication of this model in the
veri cation and synthesis of mixed synchronous-asynchronous circuits is studied. Till
date, this paradigm has been mainly explored on the basis of Globally Asynchronous
Locally Synchronous (GALS) systems. Despite decades of research, GALS design has
failed to gain traction commercially. To understand its drawbacks, a simulation framework
characterising the physical and functional aspects of GALS SoCs is presented.
A novel method for synthesising mixed synchronous-asynchronous circuits with varying
levels of rigidity is proposed. Starting with a high-level data ow model of a system which
is intrinsically asynchronous, the key idea is to introduce rigidity of chosen granularity
levels in the model without changing functional behaviour. The system is then partitioned
into functional blocks of synchronous and asynchronous elements before being transformed
into an equivalent circuit which can be synthesised using standard EDA tools
Doctor of Philosophy
dissertationAsynchronous design has a very promising potential even though it has largely received a cold reception from industry. Part of this reluctance has been due to the necessity of custom design languages and computer aided design (CAD) flows to design, optimize, and validate asynchronous modules and systems. Next generation asynchronous flows should support modern programming languages (e.g., Verilog) and application specific integrated circuits (ASIC) CAD tools. They also have to support multifrequency designs with mixed synchronous (clocked) and asynchronous (unclocked) designs. This work presents a novel relative timing (RT) based methodology for generating multifrequency designs using synchronous CAD tools and flows. Synchronous CAD tools must be constrained for them to work with asynchronous circuits. Identification of these constraints and characterization flow to automatically derive the constraints is presented. The effect of the constraints on the designs and the way they are handled by the synchronous CAD tools are analyzed and reported in this work. The automation of the generation of asynchronous design templates and also the constraint generation is an important problem. Algorithms for automation of reset addition to asynchronous circuits and power and/or performance optimizations applied to the circuits using logical effort are explored thus filling an important hole in the automation flow. Constraints representing cyclic asynchronous circuits as directed acyclic graphs (DAGs) to the CAD tools is necessary for applying synchronous CAD optimizations like sizing, path delay optimizations and also using static timing analysis (STA) on these circuits. A thorough investigation for the requirements of cycle cutting while preserving timing paths is presented with an algorithm to automate the process of generating them. A large set of designs for 4 phase handshake protocol circuit implementations with early and late data validity are characterized for area, power and performance. Benchmark circuits with automated scripts to generate various configurations for better understanding of the designs are proposed and analyzed. Extension to the methodology like addition of scan insertion using automatic test pattern generation (ATPG) tools to add testability of datapath in bundled data asynchronous circuit implementations and timing closure approaches are also described. Energy, area, and performance of purely asynchronous circuits and circuits with mixed synchronous and asynchronous blocks are explored. Results indicate the benefits that can be derived by generating circuits with asynchronous components using this methodology
Design of Asynchronous Processor
There has been a resurgence of interest in asynchronous design recently. The renewed interest in asynchronous design results from its potential to address the problem faced by the synchronous design methodology. In asynchronous
methodology, there is no global clock controlling the synchronization of a circuit; instead, the data communication between each functional unit is completed through local request-acknowledge handshake protocol. The growth in demand of high performance portable systems has accelerated asynchronous logic design technique which can offers better performance and lower power consumption especially in the development of the asynchronous processor for mobile and portable application. In this thesis, the design and verification of an 8-bit asynchronous pipelined
processor is presented. The developed asynchronous processor is based on Harvard architecture and uses Reduced Instruction Set Computer (RISC) instruction set architecture. 24 instructions are supported by the processor including register, memory, branch and jump operations. The processor has three-stage pipelining i.e.
fetch, decode and execution pipeline. Micropipelines framework with 2-phase signalling protocol and bundled-data approach is employed in designing complex and powerful asynchronous control circuits for the processor. Very High Speed Integrated Circuit Hardware Description Language (VHDL) is used to design and construct all parts of the asynchronous processor. Simulation, synthesis and verification of the processor are carried out using MAX +PLUS II software. The simulation results have demonstrated that the developed 8-bit asynchronous RISC processor is working correctly using current Field Programmable Gate Array (FPGA) technology. This processor employed 903 logic cells and has 6144 memory bits for instruction and data memory. Each of the processor subsystem can operates at different cycle time, thus enable an asynchronous processor achieving 11.95MHz average speed performance
Doctor of Philosophy
dissertationPortable electronic devices will be limited to available energy of existing battery chemistries for the foreseeable future. However, system-on-chips (SoCs) used in these devices are under a demand to offer more functionality and increased battery life. A difficult problem in SoC design is providing energy-efficient communication between its components while maintaining the required performance. This dissertation introduces a novel energy-efficient network-on-chip (NoC) communication architecture. A NoC is used within complex SoCs due it its superior performance, energy usage, modularity, and scalability over traditional bus and point-to-point methods of connecting SoC components. This is the first academic research that combines asynchronous NoC circuits, a focus on energy-efficient design, and a software framework to customize a NoC for a particular SoC. Its key contribution is demonstrating that a simple, asynchronous NoC concept is a good match for low-power devices, and is a fruitful area for additional investigation. The proposed NoC is energy-efficient in several ways: simple switch and arbitration logic, low port radix, latch-based router buffering, a topology with the minimum number of 3-port routers, and the asynchronous advantages of zero dynamic power consumption while idle and the lack of a clock tree. The tool framework developed for this work uses novel methods to optimize the topology and router oorplan based on simulated annealing and force-directed movement. It studies link pipelining techniques that yield improved throughput in an energy-efficient manner. A simulator is automatically generated for each customized NoC, and its traffic generators use a self-similar message distribution, as opposed to Poisson, to better match application behavior. Compared to a conventional synchronous NoC, this design is superior by achieving comparable message latency with half the energy
Custom Cell Placement Automation for Asynchronous VLSI
Asynchronous Very-Large-Scale-Integration (VLSI) integrated circuits have demonstrated many advantages over their synchronous counterparts, including low power consumption, elastic pipelining, robustness against manufacturing and temperature variations, etc. However, the lack of dedicated electronic design automation (EDA) tools, especially physical layout automation tools, largely limits the adoption of asynchronous circuits. Existing commercial placement tools are optimized for synchronous circuits, and require a standard cell library provided by semiconductor foundries to complete the physical design. The physical layouts of cells in this library have the same height to simplify the placement problem and the power distribution network. Although the standard cell methodology also works for asynchronous designs, the performance is inferior compared with counterparts designed using the full-custom design methodology. To tackle this challenge, we propose a gridded cell layout methodology for asynchronous circuits, in which the cell height and cell width can be any integer multiple of two grid values. The gridded cell approach combines the shape regularity of standard cells with the size flexibility of full-custom layouts. Therefore, this approach can achieve a better space utilization ratio and lower wire length for asynchronous designs. Experiments have shown that the gridded cell placement approach reduces area without impacting the routability. We have also used this placer to tape out a chip in a 65nm process technology, demonstrating that our placer generates design-rule clean results
Comparing energy and latency of asynchronous and synchronous NoCs for embedded SoCs
Journal ArticlePower consumption of on-chip interconnects is a primary concern for many embedded system-on-chip (SoC) applications. In this paper, we compare energy and performance characteristics of asynchronous (clockless) and synchronous network on-chip implementations, optimized for a number of SoC designs. We adapted the COSI-2.0 framework with ORION 2.0 router and wire models for synchronous network generation. Our own tool, ANetGen, specifies the asynchronous network by determining the topology with simulated-annealing and router locations with force-directed placement. It uses energy and delay models from our 65 nm bundled-data router design. SystemC simulations varied traffic burstiness using the self-similar b-model. Results show that the asynchronous network provided lower median and maximum message latency, especially under bursty traffic, and used far less router energy with a slight overhead for the interrouter wires
Architectural Exploration of KeyRing Self-Timed Processors
RĂSUMĂ
Les derniĂšres dĂ©cennies ont vu lâaugmentation des performances des processeurs contraintes
par les limites imposĂ©es par la consommation dâĂ©nergie des systĂšmes Ă©lectroniques : des trĂšs
basses consommations requises pour les objets connectés, aux budgets de dépenses électriques
des serveurs, en passant par les limitations thermiques et la durée de vie des batteries des
appareils mobiles. Cette forte demande en processeurs efficients en énergie, couplée avec
les limitations de la rĂ©duction dâĂ©chelle des transistorsâqui ne permet plus dâamĂ©liorer les
performances Ă densitĂ© de puissance constanteâ, conduit les concepteurs de circuits intĂ©grĂ©s
Ă explorer de nouvelles microarchitectures permettant dâobtenir de meilleures performances
pour un budget Ă©nergĂ©tique donnĂ©. Cette thĂšse sâinscrit dans cette tendance en proposant
une nouvelle microarchitecture de processeur, appelĂ©e KeyRing, conçue avec lâintention de
rĂ©duire la consommation dâĂ©nergie des processeurs.
La frĂ©quence dâopĂ©ration des transistors dans les circuits intĂ©grĂ©s est proportionnelle Ă leur
consommation dynamique dâĂ©nergie. Par consĂ©quent, les techniques de conception permettant
de réduire dynamiquement le nombre de transistors en opération sont trÚs largement
adoptĂ©es pour amĂ©liorer lâefficience Ă©nergĂ©tique des processeurs. La technique de clock-gating
est particuliĂšrement usitĂ©e dans les circuits synchrones, car elle rĂ©duit lâimpact de lâhorloge
globale, qui est la principale source dâactivitĂ©. La microarchitecture KeyRing prĂ©sentĂ©e dans
cette thÚse utilise une méthode de synchronisation décentralisée et asynchrone pour réduire
lâactivitĂ© des circuits. Elle est dĂ©rivĂ©e du processeur AnARM, un processeur dĂ©veloppĂ© par
Octasic sur la base dâune microarchitecture asynchrone ad hoc. Bien quâil soit plus efficient
en Ă©nergie que des alternatives synchrones, le AnARM est essentiellement incompatible avec
les mĂ©thodes de synthĂšse et dâanalyse temporelle statique standards. De plus, sa technique
de conception ad hoc ne sâinscrit que partiellement dans les paradigmes de conceptions asynchrones.
Cette thÚse propose une approche rigoureuse pour définir les principes généraux
de cette technique de conception ad hoc, en faisant levier sur la littérature asynchrone. La
microarchitecture KeyRing qui en résulte est développée en association avec une méthode
de conception automatisĂ©e, qui permet de sâaffranchir des incompatibilitĂ©s natives existant
entre les outils de conception et les systÚmes asynchrones. La méthode proposée permet de
pleinement mettre Ă profit les flots de conception standards de lâindustrie microĂ©lectronique
pour réaliser la synthÚse et la vérification des circuits KeyRing. Cette thÚse propose également
des protocoles expérimentaux, dont le but est de renforcer la relation de causalité
entre la microarchitecture KeyRing et une réduction de la consommation énergétique des
processeurs, comparativement Ă des alternatives synchrones Ă©quivalentes.----------ABSTRACT
Over the last years, microprocessors have had to increase their performances while keeping
their power envelope within tight bounds, as dictated by the needs of various markets: from
the ultra-low power requirements of the IoT, to the electrical power consumption budget
in enterprise servers, by way of passive cooling and day-long battery life in mobile devices.
This high demand for power-efficient processors, coupled with the limitations of technology
scalingâwhich no longer provides improved performances at constant power densitiesâ, is
leading designers to explore new microarchitectures with the goal of pulling more performances
out of a fixed power budget. This work enters into this trend by proposing a new
processor microarchitecture, called KeyRing, having a low-power design intent.
The switching activity of integrated circuitsâi.e. transistors switching on and offâdirectly
affects their dynamic power consumption. Circuit-level design techniques such as clock-gating
are widely adopted as they dramatically reduce the impact of the global clock in synchronous
circuits, which constitutes the main source of switching activity. The KeyRing microarchitecture
presented in this work uses an asynchronous clocking scheme that relies on decentralized
synchronization mechanisms to reduce the switching activity of circuits. It is derived from
the AnARM, a power-efficient ARM processor developed by Octasic using an ad hoc asynchronous
microarchitecture. Although it delivers better power-efficiency than synchronous
alternatives, it is for the most part incompatible with standard timing-driven synthesis and
Static Timing Analysis (STA). In addition, its design style does not fit well within the existing
asynchronous design paradigms. This work lays the foundations for a more rigorous
definition of this rather unorthodox design style, using circuits and methods coming from the
asynchronous literature. The resulting KeyRing microarchitecture is developed in combination
with Electronic Design Automation (EDA) methods that alleviate incompatibility issues
related to ad hoc clocking, enabling timing-driven optimizations and verifications of KeyRing
circuits using industry-standard design flows. In addition to bridging the gap with standard
design practices, this work also proposes comprehensive experimental protocols that aims to
strengthen the causal relation between the reported asynchronous microarchitecture and a
reduced power consumption compared with synchronous alternatives.
The main achievement of this work is a framework that enables the architectural exploration
of circuits using the KeyRing microarchitecture
The design of an asynchronous BCJR/MAP convolutional channel decoder.
The digital design alternative to the everyday synchronous circuit design paradigm is the asynchronous model. Asynchronous circuits are also known as handshaking circuits and they may prove to be a feasible design alternative in the modern digital Very Large Scale Integration (VLSI) design environment. Asynchronous circuits and systems offer the possibility of lower system power requirements, reduced noise, elimination of clock skew and many other benefits. Channel coding is a useful means of eliminating erroneous transmission due to the communication channel\u27s physical limits. Convolutional coding has come to the forefront of channel coding discussions due to the usefulness of turbo codes. The niche market for turbo codes have typically been in satellite communication. The usefulness of turbo codes are now expanding into the next generation of handheld communication products. It is probable that the turbo coding scheme will reside in the next cellular phone one purchases [1]. Turbo coding uses two BCJR decoders in its implementation. The BCJR decoding algorithm was named after its creators Bahl, Cocke, Jelinek, and Raviv (BCJR). The BCJR algorithm is sometimes known as a Maximum Priori Posteriori (MAP) algorithm. This means a very large part of the turbo coding research will encompass the BCJR/MAP decoder and its optimization for size, power and performance. An investigation into the design of a BCJR/MAP convolutional channel decoder will be introduced. This will encompass the use and synthesis of an asynchronous Hardware Definition Language (HDL) called Balsa. The design will be carried through to the gate implementation level. Proper gate level analysis will identify the key metrics that will determine the feasibility of an asynchronous design of that of the everyday clocked paradigm.* *This dissertation is a compound document (contains both a paper copy and a CD as part of the dissertation).Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .P47. Source: Masters Abstracts International, Volume: 43-05, page: 1782. Adviser: Kemal Tepe. Thesis (M.A.Sc.)--University of Windsor (Canada), 2005
Introducing KeyRing selfâtimed microarchitecture and timingâdriven design flow
ABSTRACT: A self-timed microarchitecture called KeyRing is presented, and a method for implementing KeyRing circuits compatible with a timing-driven electronic design automation (EDA) flow is discussed. The KeyRing microarchitecture is derived from the AnARM, a low-power self-timed ARM processor based on ad hoc design principles. First, the unorthodox design style and circuit structures are revisited. A theoretical model that can support the design of generic circuits and the elaboration of EDA methods is then presented. Also addressed are the compatibility issues between KeyRing circuits and timing-driven EDA flows. The proposed method leverages relative timing constraints to translate the timing relations in a KeyRing circuit into a set of timing constraints that enable timing-driven synthesis and static timing analysis. Finally, two 32-bit RISC-V processors are presented; called KeyV and based on KeyRing microarchitectures, they are synthesized in a 65 nm technology using the proposed EDA flow. Postsynthesis results demonstrate the effectiveness of the design methodology and allow comparisons with a synchronous alternative called SynV. Performance and power consumption evaluations show that KeyV has a power efficiency that lies between SynV with clock-gating and SynV without clock-gating
- âŠ