734 research outputs found
๋ก์ง ๋ฐ ํผ์ง์ปฌ ํฉ์ฑ์์์ ํ์ด๋ฐ ๋ถ์๊ณผ ์ต์ ํ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ ๋ณด๊ณตํ๋ถ, 2020. 8. ๊นํํ.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of processโvoltageโtemperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.ํ์ด๋ฐ ๋ถ์์ ๋ฐ๋์ฒด ํ๋ก ๊ฐ๋ฐ ํ์ ๊ณผ์ ์ค ํ๋๋ก, ์ต์ ๊ณต์ ์ผ์๋ก ๊ณต์ -์ ์-์จ๋ ๋ณ์ด ์ฆ๊ฐ๋ฅผ ํฌํจํ ๋ค์ํ ์์ธ์ผ๋ก ํ์ฌ๊ธ ๊ทธ ์ค์์ฑ์ด ์ปค์ง๊ณ ์๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ๋ก์ง ๋ฐ ํผ์ง์ปฌ ํฉ์ฑ๊ณผ ๊ด๋ จํ์ฌ ์ธ ๊ฐ์ง ํ์ด๋ฐ ๋ถ์ ๋ฐ ์ต์ ํ ๋ฌธ์ ์ ๋ํด ๋ค๋ฃฌ๋ค. ์ฒซ์งธ๋ก, ์ค๋๋ ๋๋ถ๋ถ์ ์ ์ ํ์ด๋ฐ ๋ถ์์ ๋ชจ๋ ํ๋ฆฝ-ํ๋กญ์ ํด๋ญ-์ถ๋ ฅ ๋๋ ์ด๊ฐ ๊ณ ์ ๋ ๊ฐ์ด๋ผ๋ ๊ฐ์ ์ ๋ฐํ์ผ๋ก ์ด๋ฃจ์ด์ก๋ค. ํ์ง๋ง ์ค์ ํด๋ญ-์ถ๋ ฅ ๋๋ ์ด๋ ํด๋น ํ๋ฆฝ-ํ๋กญ์ ์
์
๋ฐ ํ๋ ์คํ์ ์ํฅ์ ๋ฐ๋๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ์ด๋ฌํ ํน์ฑ์ ์ํ์ ์ผ๋ก ์ ๋ฆฌํ์์ผ๋ฉฐ, ์ด๋ฅผ ํ์ฅ ๊ฐ๋ฅํ ์๋ ํฅ์ ๊ธฐ๋ฒ๊ณผ ๋๋ถ์ด ์ฃผ์ด์ง ํ๋ก์ ํ์ด๋ฐ ๋ถ์ ๋ฐ ํด๋ญ ์คํ ์ค์ผ์ฅด๋ง ๋ฌธ์ ์ ์ ์ฉํ์๋ค. ๋์งธ๋ก, ์ ์ฌ ๋ฌธํฑ ์ฐ์ฐ์ ์ด๊ณ ์ง์ ํ๋ก ๋์์ ์๋์ง ํจ์จ์ ๋์ด ์ฌ๋ฆด ์ ์๋ค๋ ์ ์์ ๊ฐ๊ด๋ฐ์ง๋ง, ํฐ ํญ์ ์ฑ๋ฅ ๋ณ์ด ๋ฐ ๋น์ ํ์ฑ ๋๋ฌธ์ ๋๋ฆฌ ํ์ฉ๋๊ณ ์์ง ์๋ค. ์ด๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ์ ์ฌ ๋ฌธํฑ ์ ์ ์์ญ ๋ฐ ์ต์ ๊ณต์ ๋
ธ๋์์ ๋ณด๋ค ์ ํํ ํ์ด๋ฐ ์์ธก์ ์ํ ํ๋์จ์ด ์ฑ๋ฅ ๋ชจ๋ํฐ๋ง ๋ฐฉ๋ฒ๋ก ์ ๋ฐ์ ์ ์ํ์๋ค. ๋ง์ง๋ง์ผ๋ก, ๋น๋๊ธฐ ํ๋ก๋ ๊ธฐ์กด ๋๊ธฐ ํ๋ก์ ๋์ ์ค ํ๋๋ก, ๊ทธ ์ค์์๋ ๋น๋๊ธฐ ํ์ดํ๋ผ์ธ ํ๋ก๋ ๋น๊ต์ ์ ์ ์ค๊ณ ๋
ธ๋ ฅ๋ง์ผ๋ก๋ ๊ตฌํ ๊ฐ๋ฅํ๋ค๋ ์ฅ์ ์ด ์๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ 2์์ ๋ฌถ์ ๋ฐ์ดํฐ ํ๋กํ ์ฝ ๊ธฐ๋ฐ ๋น๋๊ธฐ ํ์ดํ๋ผ์ธ ์ปจํธ๋กค๋ฌ ์์์, ์ ํํ ํธ๋์
ฐ์ดํน ํต์ ์ ์ํด ์ฝ์
๋ ๋๋ ์ด ๋ฒํผ์ ์ํ ๋ฉด์ ์ฆ๊ฐ๋ฅผ ์ํํ ์ ์๋ ํฉ์ฑ ๊ธฐ๋ฒ์ ์ ์ํ์๋ค.1 INTRODUCTION 1
1.1 Flexible Flip-Flop Timing Model 1
1.2 Hardware Performance Monitoring Methodology 4
1.3 Asynchronous Pipeline Controller 10
1.4 Contributions of this Dissertation 15
2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17
2.1 Preliminaries 17
2.1.1 Terminologies 17
2.1.2 Timing Analysis 20
2.1.3 Clock-to-Q Delay Surface Modeling 21
2.2 Clock-to-Q Delay Interval Analysis 22
2.2.1 Derivation 23
2.2.2 Additional Constraints 26
2.2.3 Analysis: Finding Minimum Clock Period 28
2.2.4 Optimization: Clock Skew Scheduling 30
2.2.5 Scalable Speedup Technique 33
2.3 Experimental Results 37
2.3.1 Application to Minimum Clock Period Finding 37
2.3.2 Application to Clock Skew Scheduling 39
2.3.3 Efficacy of Scalable Speedup Technique 43
2.4 Summary 44
3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45
3.1 Overall Flow of Proposed HPM Methodology 45
3.2 Prerequisites to HPM Methodology 47
3.2.1 BEOL Process Variation Modeling 47
3.2.2 Surrogate Model Preparation 49
3.3 HPM Methodology: Design Phase 52
3.3.1 HPM2PV Model Construction 52
3.3.2 Optimization of Monitoring Circuits Configuration 54
3.3.3 PV2CPT Model Construction 58
3.4 HPM Methodology: Post-Silicon Phase 60
3.4.1 Transfer Learning in Silicon Characterization Step 60
3.4.2 Procedures in Volume Production Phase 61
3.5 Experimental Results 62
3.5.1 Experimental Setup 62
3.5.2 Exploration of Monitoring Circuits Configuration 64
3.5.3 Effectiveness of Monitoring Circuits Optimization 66
3.5.4 Considering BEOL PVs and Uncertainty Learning 68
3.5.5 Comparison among Different Prediction Flows 69
3.5.6 Effectiveness of Prediction Model Calibration 71
3.6 Summary 73
4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75
4.1 Preliminaries and State-of-the-Art Work 75
4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75
4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76
4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77
4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78
4.2.1 Synthesizing Sharable Delay Paths 78
4.2.2 Validating Logical Correctness for Sharable Delay Paths 80
4.2.3 Reformulating Timing Constraints of Controller Template 81
4.2.4 Minimally Allocating Delay Buffers 87
4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88
4.3.1 Synthesizing Delay Path Units 88
4.3.2 Validating Logical Correctness of Delay Path Units 89
4.3.3 Updating Timing Constraints for Delay Path Units 91
4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95
4.4 Experimental Results 99
4.4.1 Environment Setup 99
4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99
4.4.3 Comparison of Power, Performance, and Area 102
4.5 Summary 107
5 CONCLUSION 109
5.1 Chapter 2 109
5.2 Chapter 3 110
5.3 Chapter 4 110
Abstract (In Korean) 127Docto
Design, analysis and implementation of voltage sensor for power-constrained systems
PhD ThesisThanks to an extensive effort by the global research community, the electronic technology has significantly matured over the last decade. This technology has enabled certain operations which humans could not otherwise easily perform. For instance, electronic systems can be used to perform sensing, monitoring and even control operations in environments such as outer space, underground, under the sea or even inside the human body. The main difficulty for electronics operating in these environments is access to a reliable and permanent source of energy. Using batteries as the immediate solution for this problem has helped to provide energy for limited periods of time; however, regular maintenance and replacement are required. Consequently, battery solutions fail wherever replacing them is not possible or operation for long periods is needed. For such cases, researchers have proposed harvesting ambient energy and converting it into an electrical form. An important issue with energy harvesters is that their operation and output power depend critically on the amount of energy they receive and because ambient energy often tends to be sporadic in nature, energy harvesters cannot produce stable or fixed levels of power all of the time. Therefore, electronic devices powered in this way must be capable of adapting their operation to the energy status of the harvester. To achieve this, information on the energy available for use is needed. This can be provided by a sensor capable of measuring voltage. However, stable and fixed voltage and time references are a prerequisite of most traditional voltage measurement devices, but these generally do not exist in energy harvesting environments. A further challenge is that such a sensor also needs to be powered by the energy harvesterโs unstable voltage. In this thesis, the design of a reference-free voltage sensor, which can operate with a varying voltage source, is provided based on the capture of a portion of the total energy which is directly related to
II
the energy being sensed. This energy is then used to power a computation which quantifies captured energy over time, with the information directly generated as digital code. The sensor was fabricated in the 180 nm technology node and successfully tested by performing voltage measurements over the range 1.8 V to 0.8 V
Master of Science
thesisIntegrated circuits often consist of multiple processing elements that are regularly tiled across the two-dimensional surface of a die. This work presents the design and integration of high speed relative timed routers for asynchronous network-on-chip. It researches NoC's efficiency through simplicity by directly translating simple T-router, source-routing, single-flit packet to higher radix routers. This work is intended to study performance and power trade-offs adding higher radix routers, 3D topologies, Virtual Channels, Accurate NoC modeling, and Transmission line communication links. Routers with and without virtual channels are designed and integrated to arrayed communication networks. Furthermore, the work investigates 3D networks with diffusive RC wires and transmission lines on long wrap interconnects
Elastic circuits
Elasticity in circuits and systems provides tolerance to variations in computation and communication delays. This paper presents a comprehensive overview of elastic circuits for those designers who are mainly familiar with synchronous design. Elasticity can be implemented both synchronously and asynchronously, although it was traditionally more often associated with asynchronous circuits. This paper shows that synchronous and asynchronous elastic circuits can be designed, analyzed, and optimized using similar techniques. Thus, choices between synchronous and asynchronous implementations are localized and deferred until late in the design process.Peer ReviewedPostprint (published version
Recommended from our members
On Multicast in Asynchronous Networks-on-Chip: Techniques, Architectures, and FPGA Implementation
In this era of exascale computing, conventional synchronous design techniques are facing unprecedented challenges. The consumer electronics market is replete with many-core systems in the range of 16 cores to thousands of cores on chip, integrating multi-billion transistors. However, with this ever increasing complexity, the traditional design approaches are facing key issues such as increasing chip power, process variability, aging, thermal problems, and scalability. An alternative paradigm that has gained significant interest in the last decade is asynchronous design. Asynchronous designs have several potential advantages: they are naturally energy proportional, burning power only when active, do not require complex clock distribution, are robust to different forms of variability, and provide ease of composability for heterogeneous platforms. Networks-on-chip (NoCs) is an interconnect paradigm that has been introduced to deal with the ever-increasing system complexity. NoCs provide a distributed, scalable, and efficient interconnect solution for todayโs many-core systems. Moreover, NoCs are a natural match with asynchronous design techniques, as they separate communication infrastructure and timing from the computational elements. To this end, globally-asynchronous locally-synchronous (GALS) systems that interconnect multiple processing cores, operating at different clock speeds, using an asynchronous NoC, have gained significant interest. While asynchronous NoCs have several advantages, they also face a key challenge of supporting new types of traffic patterns. Once such pattern is multicast communication, where a source sends packets to arbitrary number of destinations. Multicast is not only common in parallel computing, such as for cache coherency, but also for emerging areas such as neuromorphic computing. This important capability has been largely missing from asynchronous NoCs. This thesis introduces several efficient multicast solutions for these interconnects. In particular, techniques, and network architectures are introduced to support high-performance and low-power multicast. Two leading network topologies are the focus: a variant mesh-of-trees (MoT) and a 2D mesh. In addition, for a more realistic implementation and analysis, as well as significantly advancing the field of asynchronous NoCs, this thesis also targets synthesis of these NoCs on commercial FPGAs. While there has been significant advances in FPGA technologies, there has been only limited research on implementing asynchronous NoCs on FPGAs. To this end, a systematic computeraided design (CAD) methodology has been introduced to efficiently and safely map asynchronous NoCs on FPGAs. Overall, this thesis makes the following three contributions. The first contribution is a multicast solution for a variant MoT network topology. This topology consists of simple low-radix switches, and has been used in high-performance computing platforms. A novel local speculation technique is introduced, where a subset of the networkโs switches are speculative that always broadcast every packet. These switches are very simple and have high performance. Speculative switches are surrounded by non-speculative ones that route packets based on their destinations and also throttle any redundant copies created by the former. This hybrid network architecture achieved significant performance and power benefits over other multicast approaches. The second contribution is a multicast solution for a 2D-mesh topology, which is more complex with higher-radix switches and also is more commonly used. A novel continuous-time replication strategy is introduced to optimize the critical multi-way forking operation of a multicast transmission. In this technique, a multicast packet is first stored in an input port of a switch, from where it is sent through distinct output ports towards different destinations concurrently, at each outputโs own rate and in continuous time. This strategy is shown to have significant latency and energy benefits over an approach that performs multicast using multiple distinct serial unicasts to each destination. Finally, a systematic CAD methodology is introduced to synthesize asynchronous NoCs on commercial FPGAs. A two-fold goal is targeted: correctness and high performance. For ease of implementation, only existing FPGA synthesis tools are used. Moreover, since asynchronous NoCs involve special asynchronous components, a comprehensive guide is introduced to map these elements correctly and efficiently. Two asynchronous NoC switches are synthesized using the proposed approach on a leading Xilinx FPGA in 28 nm: one that only handles unicast, and the other that also supports multicast. Both showed significant energy benefits with some performance gains over a state-of-the-art synchronous switch
Design and analysis of SRAMs for energy harvesting systems
PhD ThesisAt present, the battery is employed as a power source for wide varieties of microelectronic systems ranging from biomedical implants and sensor net-works to portable devices. However, the battery has several limitations and incurs many challenges for the majority of these systems. For instance, the design considerations of implantable devices concern about the battery from two aspects, the toxic materials it contains and its lifetime since replacing the battery means a surgical operation. Another challenge appears in wire-less sensor networks, where hundreds or thousands of nodes are scattered around the monitored environment and the battery of each node should be maintained and replaced regularly, nonetheless, the batteries in these nodes do not all run out at the same time.
Since the introduction of portable systems, the area of low power designs has witnessed extensive research, driven by the industrial needs, towards the aim of extending the lives of batteries. Coincidentally, the continuing innovations in the field of micro-generators made their outputs in the same range of several portable applications. This overlap creates a clear oppor-tunity to develop new generations of electronic systems that can be powered, or at least augmented, by energy harvesters. Such self-powered systems benefit applications where maintaining and replacing batteries are impossi-ble, inconvenient, costly, or hazardous, in addition to decreasing the adverse effects the battery has on the environment.
The main goal of this research study is to investigate energy harvesting aware design techniques for computational logic in order to enable the capa-
II
bility of working under non-deterministic energy sources. As a case study, the research concentrates on a vital part of all computational loads, SRAM, which occupies more than 90% of the chip area according to the ITRS re-ports.
Essentially, this research conducted experiments to find out the design met-ric of an SRAM that is the most vulnerable to unpredictable energy sources, which has been confirmed to be the timing. Accordingly, the study proposed a truly self-timed SRAM that is realized based on complete handshaking protocols in the 6T bit-cell regulated by a fully Speed Independent (SI) tim-ing circuitry. The study proved the functionality of the proposed design in real silicon. Finally, the project enhanced other performance metrics of the self-timed SRAM concentrating on the bit-line length and the minimum operational voltage by employing several additional design techniques.Umm Al-Qura University, the Ministry of Higher Education in the Kingdom of Saudi Arabia, and the Saudi Cultural Burea
Recommended from our members
Design and performance optimization of asynchronous networks-on-chip
As digital systems continue to grow in complexity, the design of conventional synchronous systems is facing unprecedented challenges. The number of transistors on individual chips is already in the multi-billion range, and a greatly increasing number of components are being integrated onto a single chip. As a consequence, modern digital designs are under strong time-to-market pressure, and there is a critical need for composable design approaches for large complex systems.
In the past two decades, networks-on-chip (NoCโs) have been a highly active research area. In a NoC-based system, functional blocks are first designed individually and may run at different clock rates. These modules are then connected through a structured network for on-chip global communication. However, due to the rigidity of centrally-clocked NoCโs, there have been bottlenecks of system scalability, energy and performance, which cannot be easily solved with synchronous approaches. As a result, there has been significant recent interest in combing the notion of asynchrony with NoC designs. Since the NoC approach inherently separates the communication infrastructure, and its timing, from computational elements, it is a natural match for an asynchronous paradigm. Asynchronous NoCโs, therefore, enable a modular and extensible system composition for an โobject-orientโ design style.
The thesis aims to significantly advance the state-of-art and viability of asynchronous and globally-asynchronous locally-synchronous (GALS) networks-on-chip, to enable high-performance and low-energy systems. The proposed asynchronous NoCโs are nearly entirely based on standard cells, which eases their integration into industrial design flows. The contributions are instantiated in three different directions.
First, practical acceleration techniques are proposed for optimizing the system latency, in order to break through the latency bottleneck in the memory interfaces of many on-chip parallel processors. Novel asynchronous network protocols are proposed, along with concrete NoC designs. A new concept, called โmonitoring networkโ, is introduced. Monitoring networks are lightweight shadow networks used for fast-forwarding anticipated traffic information, ahead of the actual packet traffic. The routers are therefore allowed to initiate and perform arbitration and channel allocation in advance. The technique is successfully applied to two topologies which belong to two different categories โ a variant mesh-of-trees (MoT) structure and a 2D-mesh topology. Considerable and stable latency improvements are observed across a wide range of traffic patterns, along with moderate throughput gains.
Second, for the first time, a high-performance and low-power asynchronous NoC router is compared directly to a leading commercial synchronous counterpart in an advanced industrial technology. The asynchronous router design shows significant performance improvements, as well as area and power savings. The proposed asynchronous router integrates several advanced techniques, including a low-latency circular FIFO for buffer design, and a novel end-to-end credit-based virtual channel (VC) flow control. In addition, a semi-automated design flow is created, which uses portions of a standard synchronous tool flow.
Finally, a high-performance multi-resource asynchronous arbiter design is developed. This small but important component can be directly used in existing asynchronous NoCโs for performance optimization. In addition, this standalone design promises use in opening up new NoC directions, as well as for general use in parallel systems. In the proposed arbiter design, the allocation of a resource to a client is divided into several steps. Multiple successive client-resource pairs can be selected rapidly in pipelined sequence, and the completion of the assignments can overlap in parallel.
In sum, the thesis provides a set of advanced design solutions for performance optimization of asynchronous and GALS networks-on-chip. These solutions are at different levels, from network protocols, down to router- and component-level optimizations, which can be directly applied to existing basic asynchronous NoC designs to provide a leap in performance improvement
Doctor of Philosophy
dissertationPortable electronic devices will be limited to available energy of existing battery chemistries for the foreseeable future. However, system-on-chips (SoCs) used in these devices are under a demand to offer more functionality and increased battery life. A difficult problem in SoC design is providing energy-efficient communication between its components while maintaining the required performance. This dissertation introduces a novel energy-efficient network-on-chip (NoC) communication architecture. A NoC is used within complex SoCs due it its superior performance, energy usage, modularity, and scalability over traditional bus and point-to-point methods of connecting SoC components. This is the first academic research that combines asynchronous NoC circuits, a focus on energy-efficient design, and a software framework to customize a NoC for a particular SoC. Its key contribution is demonstrating that a simple, asynchronous NoC concept is a good match for low-power devices, and is a fruitful area for additional investigation. The proposed NoC is energy-efficient in several ways: simple switch and arbitration logic, low port radix, latch-based router buffering, a topology with the minimum number of 3-port routers, and the asynchronous advantages of zero dynamic power consumption while idle and the lack of a clock tree. The tool framework developed for this work uses novel methods to optimize the topology and router oorplan based on simulated annealing and force-directed movement. It studies link pipelining techniques that yield improved throughput in an energy-efficient manner. A simulator is automatically generated for each customized NoC, and its traffic generators use a self-similar message distribution, as opposed to Poisson, to better match application behavior. Compared to a conventional synchronous NoC, this design is superior by achieving comparable message latency with half the energy
- โฆ