61 research outputs found
On Energy Efficient Computing Platforms
In accordance with the Moore's law, the increasing number of on-chip integrated transistors has enabled modern computing platforms with not only higher processing power but also more affordable prices. As a result, these platforms, including portable devices, work stations and data centres, are becoming an inevitable part of the human society. However, with the demand for portability and raising cost of power, energy efficiency has emerged to be a major concern for modern computing platforms.
As the complexity of on-chip systems increases, Network-on-Chip (NoC) has been proved as an efficient communication architecture which can further improve system performances and scalability while reducing the design cost. Therefore, in this thesis, we study and propose energy optimization approaches based on NoC architecture, with special focuses on the following aspects.
As the architectural trend of future computing platforms, 3D systems have many bene ts including higher integration density, smaller footprint, heterogeneous integration, etc. Moreover, 3D technology can signi cantly improve the network communication and effectively avoid long wirings, and therefore, provide higher system performance and energy efficiency.
With the dynamic nature of on-chip communication in large scale NoC based systems, run-time system optimization is of crucial importance in order to achieve higher system reliability and essentially energy efficiency. In this thesis, we propose an agent based system design approach where agents are on-chip components which monitor and control system parameters such as supply voltage, operating frequency, etc. With this approach, we have analysed the implementation alternatives for dynamic voltage and frequency scaling and power gating techniques at different granularity, which reduce both dynamic and leakage energy consumption.
Topologies, being one of the key factors for NoCs, are also explored for energy saving purpose. A Honeycomb NoC architecture is proposed in this thesis with turn-model based deadlock-free routing algorithms. Our analysis and simulation based evaluation show that Honeycomb NoCs outperform their Mesh based counterparts in terms of network cost, system performance as well as energy efficiency.Siirretty Doriast
Exploration and Design of Power-Efficient Networked Many-Core Systems
Multiprocessing is a promising solution to meet the requirements of near future applications. To get full benefit from parallel processing, a manycore system needs efficient, on-chip communication architecture. Networkon- Chip (NoC) is a general purpose communication concept that offers highthroughput, reduced power consumption, and keeps complexity in check by a regular composition of basic building blocks. This thesis presents power efficient communication approaches for networked many-core systems. We address a range of issues being important for designing power-efficient manycore systems at two different levels: the network-level and the router-level.
From the network-level point of view, exploiting state-of-the-art concepts such as Globally Asynchronous Locally Synchronous (GALS), Voltage/ Frequency Island (VFI), and 3D Networks-on-Chip approaches may be a solution to the excessive power consumption demanded by todayâs and future many-core systems. To this end, a low-cost 3D NoC architecture, based on high-speed GALS-based vertical channels, is proposed to mitigate high peak temperatures, power densities, and area footprints of vertical interconnects in 3D ICs. To further exploit the beneficial feature of a negligible inter-layer distance of 3D ICs, we propose a novel hybridization scheme for inter-layer communication. In addition, an efficient adaptive routing algorithm is presented which enables congestion-aware and reliable communication for the hybridized NoC architecture. An integrated monitoring and management platform on top of this architecture is also developed in order to implement more scalable power optimization techniques.
From the router-level perspective, four design styles for implementing power-efficient reconfigurable interfaces in VFI-based NoC systems are proposed. To enhance the utilization of virtual channel buffers and to manage their power consumption, a partial virtual channel sharing method for NoC routers is devised and implemented.
Extensive experiments with synthetic and real benchmarks show significant power savings and mitigated hotspots with similar performance compared to latest NoC architectures. The thesis concludes that careful codesigned elements from different network levels enable considerable power savings for many-core systems.Siirretty Doriast
Circuit design and analysis for on-FPGA communication systems
On-chip communication system has emerged as a prominently important subject in Very-Large-
Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects.
Interconnects often dictates the system performance, and, therefore, research for new
methodologies and system architectures that deliver high-performance communication services
across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable
Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication.
Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable
fabrics, switches and the specific routing architecture also introduce additional latency
and bandwidth degradation further hindering intra-chip communication performance.
Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs.
Communication with programmable interconnect received little attention and is inadequately understood.
This thesis is among the first to research on-chip communication systems that are built on
top of programmable fabrics and proposes methodologies to maximize the interconnect throughput
performance. There are three major contributions in this thesis: (i) an analysis of on-chip
interconnect fringing, which degrades the bandwidth of communication channels due to routing
congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly
improves the interconnect throughput by exploiting the fundamental electrical characteristics
of the reconfigurable interconnect structures. This new scheme can potentially mitigate
the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide
adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime
optimization for route planning and dynamic routing which, effectively utilizes the in-silicon
bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new
methodologies and concepts are proposed to enhance the on-FPGA communication throughput
performance that is of vital importance in new technology processes
ě¨ ěšŠ ë¤í¸ěíŹ ě¤ęł: 매í, ę´ëŚŹ, ëźě°í
íěë
źëŹ¸ (ë°ěŹ)-- ěě¸ëíęľ ëíě : ě 기¡ě ëł´ęłľíëś, 2016. 2. ěľę¸°ě.ě§ë ěě ë
ę° ě´ě´ě§ ë°ë체 기ě ě íĽěě 매ë ě˝ě´ě ěë뼟 ę°ě ¸ë¤ 죟ěë¤.
ě°ëŚŹę° ěźě ěíě ě°ë ë°ě¤íŹíą ěť´í¨í°ěĄ°ě°¨ë ě´ëŻ¸ ě ę°ě ě˝ě´ëĽź ę°ě§ęł ěěźëŠ°, ěë°ą ę°ě ě˝ě´ëĽź ę°ě§ 욊ë ěěŠíëě´ ěë¤.
ě´ëŹí ë§ě ě˝ě´ë¤ ę°ě íľě 기ë°ěźëĄě, ë¤í¸ěíŹ-ě¨-욊(NoC)ě´ ěëĄě´ ëëëěěźëŠ°, ě´ë íěŹ ë§ě ě°ęľŹ ë° ěěŠ ě íěě ë댏 ěŹěŠëęł ěë¤. ꡸ëŹë ë¤í¸ěíŹ-ě¨-욊ě 매ë ě˝ě´ ěě¤í
ě ěŹěŠíë ë°ěë ěŹëŹ ę°ě§ 돸ě ę° ë°ëĽ´ëŠ°, 본 ë
źëŹ¸ěěë ꡸ ě¤ ëŞ ę°ě§ëĽź íě´ë´ęł ě íěë¤.
본 ë
źëŹ¸ě ë ë˛ě§¸ ěąí°ěěë NoC ę¸°ë° ë§¤ëě˝ě´ 꾏쥰ě ěě
ě í ëšíęł ě¤ěźěĽ´íë ë°Šë˛ě ë¤ëŁ¨ěë¤. 매ëě˝ě´ěě ěě
í ëšě ë¤ëŁŹ ë
źëŹ¸ě ě´ëŻ¸ ë§ě´ ěśíëěě§ë§, 본 ě°ęľŹë ëŠěě§ í¨ěąęłź ęłľě ëŠëŞ¨ëŚŹ, ë ę°ě§ě íľě ë°Šěě ęł ë ¤í¨ěźëĄě¨ ěąëĽęłź ěëě§ í¨ě¨ě ę°ě íěë¤. ëí, 본 ě°ęľŹë ěë°ŠíĽ ě쥴ěąě ę°ě§ ěě
꡸ëí뼟 ě¤ěźěĽ´íë ë°Šë˛ ëí ě ěíěë¤.
3ě°¨ě ě 츾 기ě ě ëěě§ ě ë Ľ ë°ë ë돸ě ě´ ëŹ¸ě ę° ěŹę°í´ě§ë ëą, ěŹëŹ ę°ě§ ëě ęłźě 뼟 ë´íŹíęł ěë¤. ě¸ ë˛ě§¸ ěąí°ěěë DVFS 기ě ě ě´ěŠíěŹ ě´ ëŹ¸ě 뼟 ěííęł ě íë 기ě ě ěę°íë¤.
ę° ě˝ě´ě ëźě°í°ę° ě ě, ěë ěë뼟 쥰ě í ě ěë 꾏쥰ěě, ę°ěĽ ëě ěąëĽě ě´ëě´ ë´ëŠ´ěë ěľë ě¨ë뼟 ëě´ěě§ ěëëĄ íë¤.
ě¸ ë˛ě§¸ě ë¤ ë˛ě§¸ ěąí°ë ěĄ°ę¸ ë¤ëĽ¸ 츥늴ě ë¤ëŁŹë¤. 3D ě 츾 기ě ě ěŹěŠí ë, ě¸ľę° íľě ě ěŁźëĄ TSV뼟 ě´ěŠíěŹ ě´ëŁ¨ě´ě§ë¤. ꡸ëŹë TSVë ěźë° wireëł´ë¤ í¨ěŹ í° ëŠ´ě ě ě°¨ě§í기 ë돸ě, ě 체 ë¤í¸ěíŹěěě TSV ę°ěë ě íëě´ěź í 경ě°ę° ë§ë¤. ě´ ę˛˝ě°ěë ë ę°ě§ ě íě§ę° ěëë°, 첍째ë ę° ě¸ľę° íľě ěąëě ëěíě ě¤ě´ë ę˛ě´ęł , ë째ë ę° ěąëě ëěíě ě ě§íë ěźëś ë
¸ëë§ ě¸ľę° íľě ě´ ę°ëĽí ěąëě ě ęłľíë ę˛ě´ë¤. ě°ëŚŹë ę°ę°ě 경ě°ě ëíěŹ ëźě°í
ěęł ëŚŹěŚě íëěŠ ě ěíë¤.
첍 ë˛ě§¸ 경ě°ě ěě´ěë deflection ëźě°í
기ë˛ě ěŹěŠíěŹ ě¸ľę° íľě ě 긴 ě§ě° ěę°ě ꡚ볾íęł ě íěë¤. ě¸ľę° íľě ě ęˇ ëąíę˛ ëśë°°í¨ěźëĄě¨, ě ěë ěęł ëŚŹěŚě ę°ě ë ě§ě° ěę°ě ëł´ě´ëŠ° ëźě°í° ë˛íźě ě 깰뼟 íľí 늴ě ë° ěëě§ í¨ě¨ěą ëí ěťě ě ěë¤.
ë ë˛ě§¸ 경ě°ěěë ě¸ľę° íľě ěąëě ě íí기 ěí ëŞ ę°ě§ ęˇěšě ě ěíë¤. ě˝ę°ě ëźě°í
ěě ë뼟 íŹěí¨ěźëĄě¨, ě ěë ěęł ëŚŹěŚě 기쥴 ěęł ëŚŹěŚě ę°ě ěąë ě꾏 쥰깴ě ě ęą°íęł , 결곟ě ěźëĄë ěąëĽ ëë ěëě§ í¨ě¨ě ěŚę°ëĽź ę°ě ¸ ě¨ë¤.For decades, advance in semiconductor technology has led us to the era of many-core systems. Today's desktop computers already have multi-core processors, and chips with more than a hundred cores are commercially available. As a communication medium for such a large number of cores, network-on-chip (NoC) has emerged out, and now is being used by many researchers and companies. Adopting NoC for a many-core system incurs many problems, and this thesis tries to solve some of them.
The second chapter of this thesis is on mapping and scheduling of tasks on NoC-based CMP architectures. Although mapping on NoC has a number of papers published, our work reveals that selecting communication types between shared memory and message passing can help improve the performance and energy efficiency. Additionally, our framework supports scheduling applications containing backward dependencies with the help of modified modulo scheduling.
Evolving the SoCs through 3D stacking makes us face a number of new problems, and the thermal problem coming from increased power density is one of them.
In the third chapter of this thesis, we try to mitigate the hotspot problem using DVFS techniques. Assuming that all the routers as well as cores have capabilities to control voltage and frequency individually, we find voltage-frequency pairs for all cores and routers which yields the best performance within the given thermal constraint.
The fourth and the fifth chapters of this thesis are from a different aspect. In 3D stacking, inter-layer interconnections are implemented using through-silicon vias (TSV). TSVs usually take much more area than normal wires. Furthermore, they also consume silicon area as well as metal area. For this reason, designers would want to limit the number of TSVs used in their network. To limit the TSV count, there are two options: the first is to reduce the width of each vertical links, and the other is to use fewer vertical links, which results in a partially connected network. We present two routing methodologies for each case.
For the network with reduced bandwidth vertical links, we propose using deflection routing to mitigate the long latency of vertical links. By balancing the vertical traffics properly, the algorithm provides improved latency.
Also, a large amount of area and energy reduction can be obtained by the removal of router buffers.
For partially connected networks, we introduce a set of routing rules for selecting the vertical links. At the expense of sacrificing some amount of routing freedom, the proposed algorithm removes the virtual channel requirement for avoiding deadlock. As a result, the performance, or energy consumption can be reduced at the designer's choice.Chapter 1 Introduction 1
1.1 Task Mapping and Scheduling 2
1.2 Thermal Management 3
1.3 Routing for 3D Networks 5
Chapter 2 Mapping and Scheduling 9
2.1 Introduction 9
2.2 Motivation 10
2.3 Background 12
2.4 Related Work 16
2.5 Platform Description 17
2.5.1 Architcture Description 17
2.5.2 Energy Model 21
2.5.3 Communication Delay Model 22
2.6 Problem Formulation 23
2.7 Proposed Solution 25
2.7.1 Task and Communication Mapping 27
2.7.2 Communication Type Optimization 31
2.7.3 Design Space Pruning via Pre-evaluation 34
2.7.4 Scheduling 35
2.8 Experimental Results 42
2.8.1 Experiments with Coarse-grained Iterative Modulo Scheduling 42
2.8.2 Comparison with Different Mapping Algorithms 43
2.8.3 Experiments with Overall Algorithms 45
2.8.4 Experiments with Various Local Memory Sizes 47
2.8.5 Experiments with Various Placements of Shared Memory 48
Chapter 3 Thermal Management 50
3.1 Introduction 50
3.2 Background 51
3.2.1 Thermal Modeling 51
3.2.2 Heterogeneity in Thermal Propagation 52
3.3 Motivation and Problem Definition 53
3.4 Related Work 56
3.5 Orchestrated Voltage-Frequency Assignment 56
3.5.1 Individual PI Control Method 56
3.5.2 PI Controlled Weighted-Power Budgeting 57
3.5.3 Performance/Power Estimation 59
3.5.4 Frequency Assignment 62
3.5.5 Algorithm Overview 64
3.5.6 Stability Conditions for PI Controller 65
3.6 Experimental Result 66
3.6.1 Experimental Setup 66
3.6.2 Overall Algorithm Performance 68
3.6.3 Accuracy of the Estimation Model 70
3.6.4 Performance of the Frequency Assignment Algorithm 70
Chapter 4 Routing for Limited Bandwidth 3D NoC 72
4.1 Introduction 72
4.2 Motivation 73
4.3 Background 74
4.4 Related Work 75
4.5 3D Deflection Routing 76
4.5.1 Serialized TSV Model 76
4.5.2 TSV Link Injection/ejection Scheme 78
4.5.3 Deadlock Avoidance 80
4.5.4 Livelock Avoidance 84
4.5.5 Router Architecture: Putting It All Together 86
4.5.6 System Level Consideration 87
4.6 Experimental Results 89
4.6.1 Experimental Setup 89
4.6.2 Results on Synthetic Traffic Patterns 91
4.6.3 Results on Realistic Traffic Patterns 94
4.6.4 Results on Real Application Benchmarks 98
4.6.5 Fairness Issue 103
4.6.6 Area Cost Comparison 104
Chapter 5 Routing for Partially Connected 3D NoC 106
5.1 Introduction 106
5.2 Background 107
5.3 Related Work 109
5.4 Proposed Algorithm 111
5.4.1 Preliminary 112
5.4.2 Routing Algorithm for 3-D Stacked Meshes with Regular Partial Vertical Connections 115
5.4.3 Routing Algorithm for 3-D Stacked Meshes with Irregular Partial Vertical Connections 118
5.4.4 Extension to Heterogeneous Mesh Layers 122
5.5 Experimental Results 126
5.5.1 Experimental Setup 126
5.5.2 Experiments on Synthetic Traffics 128
5.5.3 Experiments on Application Benchmarks 133
5.5.4 Comparison with Reduced Bandwidth Mesh 139
Chapter 6 Conclusion 141
Bibliography 144
ě´ëĄ 163Docto
2.5D Chiplet Architecture for Embedded Processing of High Velocity Streaming Data
This dissertation presents an energy efficient 2.5D chiplet-based architecture for real-time probabilistic processing of high-velocity sensor data, from an autonomous real-time ubiquitous surveillance imaging system. This work addresses problems at all levels of description.
At the lowest physical level, new standard cell libraries have been developed for ultra-low voltage CMOS synthesis, as well as custom SRAM memory blocks, and mixed-signal physical true random number generators based on the perturbation of Sigma-Delta structures using random telegraph noise (RTN) in single transistor devices.
At the chip level architecture, an innovative compact buffer-less switched circuit mesh network on chip (NoC) capable of reaching very high throughput (1.6Tbps), finite packet delay delivery, free from packet dropping, and free from dead-locks and live-locks, was designed for this chiplet-based solution. Additionally, a second NoC connecting processors in the network, was implemented based on token-rings, allowing access to external DDR memory. Furthermore, a new clock tree distribution network, and a wide bandwidth DRAM physical interface have been designed to address the data flow requirements within and across chiplets.
At the algorithm and representation levels, the Online Change Point Detection (CPD) algorithm has been implemented for on-line learning of background-foreground segmentation. Instead of using traditional binary representation of numbers, this architecture relies on unconventional processing of signals using a bio-inspired (spike-based) unary representation of numbers, where these numbers are represented in a stochastic stream of Bernoulli random variables. By using this representation, probabilistic algorithms can be executed in a native architecture with precision on demand, where if more accuracy is required, more computational time and power can be allocated. The SoC chiplet architecture has been extensively simulated and validated using state of the art CAD methodology, and has been submitted to fabrication in a dedicated 55nm GF CMOS technology wafer run. Experimental results from fabricated test chips in the same technology are also presented
Cellular Automata
Modelling and simulation are disciplines of major importance for science and engineering. There is no science without models, and simulation has nowadays become a very useful tool, sometimes unavoidable, for development of both science and engineering. The main attractive feature of cellular automata is that, in spite of their conceptual simplicity which allows an easiness of implementation for computer simulation, as a detailed and complete mathematical analysis in principle, they are able to exhibit a wide variety of amazingly complex behaviour. This feature of cellular automata has attracted the researchers' attention from a wide variety of divergent fields of the exact disciplines of science and engineering, but also of the social sciences, and sometimes beyond. The collective complex behaviour of numerous systems, which emerge from the interaction of a multitude of simple individuals, is being conveniently modelled and simulated with cellular automata for very different purposes. In this book, a number of innovative applications of cellular automata models in the fields of Quantum Computing, Materials Science, Cryptography and Coding, and Robotics and Image Processing are presented
Particle swarm optimization for routing and wavelength assignment in next generation WDM networks.
PhDAll-optical Wave Division Multiplexed (WDM) networking is a promising technology for long-haul backbone and large metropolitan optical networks in order to meet the non-diminishing bandwidth demands of future applications and services. Examples could include archival and recovery of data to/from Storage Area Networks (i.e. for banks), High bandwidth medical imaging (for remote operations), High Definition (HD) digital broadcast and streaming over the Internet, distributed orchestrated computing, and peak-demand short-term connectivity for Access Network providers and wireless network operators for backhaul surges. One desirable feature is fast and automatic provisioning. Connection (lightpath) provisioning in optically switched networks requires both route computation and a single wavelength to be assigned for the lightpath. This is called Routing and Wavelength Assignment (RWA). RWA can be classified as static RWA and dynamic RWA. Static RWA is an NP-hard (non-polynomial time hard) optimisation task. Dynamic RWA is even more challenging as connection requests arrive dynamically, on-the-fly and have random connection holding times. Traditionally, global-optimum mathematical search schemes like integer linear programming and graph colouring are used to find an optimal solution for NP-hard problems. However such schemes become unusable for connection provisioning in a dynamic environment, due to the computational complexity and time required to undertake the search. To perform dynamic provisioning, different heuristic and stochastic techniques are used.
Particle Swarm Optimisation (PSO) is a population-based global optimisation scheme that belongs to the class of evolutionary search algorithms and has successfully been used to solve many NP-hard optimisation problems in both static and dynamic environments. In this thesis, a novel PSO based scheme is proposed to solve the static RWA case, which can achieve optimal/near-optimal solution. In order to reduce the risk of premature convergence of the swarm and to avoid selecting local optima, a search scheme is proposed to solve the static RWA, based on the position of swarmâs global best particle and personal best position of each particle.
To solve dynamic RWA problem, a PSO based scheme is proposed which can provision a connection within a fraction of a second. This feature is crucial to provisioning services like bandwidth on demand connectivity. To improve the convergence speed of the swarm towards an optimal/near-optimal solution, a novel chaotic factor is introduced into the PSO algorithm, i.e. CPSO, which helps the swarm reach a relatively good solution in fewer iterations. Experimental results for PSO/CPSO based dynamic RWA algorithms show that the proposed schemes perform better compared to other evolutionary techniques like genetic algorithms, ant colony optimization. This is both in terms of quality of solution and computation time. The proposed schemes also show significant improvements in blocking probability performance compared to traditional dynamic RWA schemes like SP-FF and SP-MU algorithms
Design of complex integrated systems based on networks-on-chip: Trading off performance, power and reliability
The steady advancement of microelectronics is associated with an escalating number of challenges for design engineers due to both the tiny dimensions and the enormous complexity of integrated systems. Against this background, this work deals with Network-On-Chip (NOC) as the emerging design paradigm to cope with diverse issues of nanotechnology. The detailed investigations within the chapters focus on the communication-centric aspects of multi-core-systems, whereas performance, power consumption as well as reliability are considered likewise as the essential design criteria
- âŚ