3,576 research outputs found
Low Power Processor Architectures and Contemporary Techniques for Power Optimization β A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. Β© 2009 ACADEMY PUBLISHER
EARLY PERFORMANCE PREDICTION METHODOLOGY FOR MANY-CORES ON CHIP BASED APPLICATIONS
Modern high performance computing applications such as personal computing, gaming, numerical simulations require application-specific integrated circuits (ASICs) that comprises of many cores. Performance for these applications depends mainly on latency of interconnects which transfer data between cores that implement applications by distributing tasks. Time-to-market is a critical consideration while designing ASICs for these applications. Therefore, to reduce design cycle time, predicting system performance accurately at an early stage of design is essential. With process technology in nanometer era, physical phenomena such as crosstalk, reflection on the propagating signal have a direct impact on performance. Incorporating these effects provides a better performance estimate at an early stage. This work presents a methodology for better performance prediction at an early stage of design, achieved by mapping system specification to a circuit-level netlist description.
At system-level, to simplify description and for efficient simulation, SystemVerilog descriptions are employed. For modeling system performance at this abstraction, queueing theory based bounded queue models are applied. At the circuit level, behavioral Input/Output Buffer Information Specification (IBIS) models can be used for analyzing effects of these physical phenomena on on-chip signal integrity and hence performance.
For behavioral circuit-level performance simulation with IBIS models, a netlist must be described consisting of interacting cores and a communication link. Two new netlists, IBIS-ISS and IBIS-AMI-ISS are introduced for this purpose. The cores are represented by a macromodel automatically generated by a developed tool from IBIS models. The generated IBIS models are employed in the new netlists. Early performance prediction methodology maps a system specification to an instance of these netlists to provide a better performance estimate at an early stage of design. The methodology is scalable in nanometer process technology and can be reused in different designs
Circuit design and analysis for on-FPGA communication systems
On-chip communication system has emerged as a prominently important subject in Very-Large-
Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects.
Interconnects often dictates the system performance, and, therefore, research for new
methodologies and system architectures that deliver high-performance communication services
across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable
Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication.
Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable
fabrics, switches and the specific routing architecture also introduce additional latency
and bandwidth degradation further hindering intra-chip communication performance.
Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs.
Communication with programmable interconnect received little attention and is inadequately understood.
This thesis is among the first to research on-chip communication systems that are built on
top of programmable fabrics and proposes methodologies to maximize the interconnect throughput
performance. There are three major contributions in this thesis: (i) an analysis of on-chip
interconnect fringing, which degrades the bandwidth of communication channels due to routing
congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly
improves the interconnect throughput by exploiting the fundamental electrical characteristics
of the reconfigurable interconnect structures. This new scheme can potentially mitigate
the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide
adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime
optimization for route planning and dynamic routing which, effectively utilizes the in-silicon
bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new
methodologies and concepts are proposed to enhance the on-FPGA communication throughput
performance that is of vital importance in new technology processes
Recommended from our members
Physics-Based Electromigration Modeling and Analysis and Optimization
Long-term reliability is a major concern in modern VLSI design. Literature has shown that reliability gets worse as technology advances. It is expected that the future VLSI systems would have shorter reliability-induced lifetime comparing with previous generations. Being one of the most serious reliability effects, electromigration (EM) is a physical phenomenon of the migration of metal atoms due to the momentum exchange between atoms and the conducting electrons. It can cause wire resistance change or open circuit and result in functional failure of the circuit. Power-ground networks are the most vulnerable part to EM effect among all the interconnect wires since the current flow on this part is the largest on the chip. With new generation oftechnology node and aggressive design strategies, more accurate and efficient EM models are required. However, traditional EM approaches are very conservative and cannot meet current aggressive design strategies. Besides circuit level, EM also need to be thoroughly studied in system level due to limited power and temperature budgets among cores on chip. This research focuses on developing physical level EM model for VLSI circuits and system level EM optimization for multi-core systems in order to overcome the aforementioned problems. Specifically, for physical level, we develop two EM immortality check methods and a power grid EM check method. Firstly, a voltage based EM immortality analysis has been developed. Immortality condition in nucleation phase can be determined fast and accurately for multi-segment interconnect wires. Secondly, a saturation volume based incubation phase immortality check method has been proposed. This method can further reduce the redundancy in VLSI circuit design by immortality check in multiphase. Furthermore, both immortality check methods are integrated into a new power grid EM check methodology (EMspice) as filter for EM analysis. These filters can accelerate the simulation by filtering out immortal trees so that we only need to do simulation on fewer trees that are mortal. Coupled EM simulation considering both hydrostatic stress and electronic current/voltage in the power grid network will be applied to these mortal trees. This tool can work seamlessly with commercial synthesis flow. Besides physical level reliability models, system level reliability optimization is also discussed in this research. A deep reinforcement learning based EM optimization has been proposed for multi-core system. Both long term reliability effect (hard error) and transient soft error are considered. Energy can be optimized with all the reliability and other constraints fast and accurately compared to existing reliability management techniques. Last but not least, a scheduling based reliability optimization method for multi-core systems has been proposed. NBTI, HCI and EM are considered jointly. Lifetime of the system can be improved significantly compared to traditional methods which mainly focus on utilization
μ΄λ―ΈμΈ νλ‘ μ€κ³λ₯Ό μν μΈν°μ»€λ₯νΈμ νμ΄λ° λΆμ λ° λμμΈ λ£° μλ° μμΈ‘
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2021. 2. κΉνν.νμ΄λ° λΆμ λ° λμμΈ λ£° μλ° μ κ±°λ λ°λ체 μΉ© μ μ‘°λ₯Ό μν λ§μ€ν¬ μ μ μ μ μλ£λμ΄μΌ ν νμ κ³Όμ μ΄λ€.
κ·Έλ¬λ νΈλμ§μ€ν°μ μΈν°μ»€λ₯νΈμ λ³μ΄κ° μ¦κ°νκ³ μκ³ λμμΈ λ£° μμ 볡μ‘ν΄μ§κ³ μκΈ° λλ¬Έμ νμ΄λ° λΆμ λ° λμμΈ λ£° μλ° μ κ±°λ μ΄λ―ΈμΈ νλ‘μμ λ μ΄λ €μμ§κ³ μλ€.
λ³Έ λ
Όλ¬Έμμλ μ΄λ―ΈμΈ μ€κ³λ₯Ό μν λκ°μ§ λ¬Έμ μΈ νμ΄λ° λΆμκ³Ό λμμΈ λ£° μλ°μ λν΄ λ€λ£¬λ€.
첫λ²μ§Έλ‘ 곡μ μ½λμμ νμ΄λ° λΆμμ μ€λ¦¬μ½μΌλ‘ μ μλ νλ‘μ μ±λ₯μ μ νν μμΈ‘νμ§ λͺ»νλ€. κ·Έ μ΄μ λ 곡μ μ½λμμ κ°μ₯ λλ¦° νμ΄λ° κ²½λ‘κ° λͺ¨λ 곡μ 쑰건μμλ κ°μ₯ λλ¦° κ²μ μλκΈ° λλ¬Έμ΄λ€. κ²λ€κ° μΉ© λ΄μ μκ³ κ²½λ‘μμ μΈν°μ»€λ₯νΈμ μν μ§μ° μκ°μ΄ μ 체 μ§μ° μκ°μμμ μν₯μ΄ μ¦κ°νκ³ μκ³ , 10λλ
Έ μ΄ν 곡μ μμλ 20%λ₯Ό μ΄κ³Όνκ³ μλ€. μ¦, μ€λ¦¬μ½μΌλ‘ μ μλ νλ‘μ μ±λ₯μ μ νν μμΈ‘νκΈ° μν΄μλ λν νλ‘κ° νΈλμ§μ€ν°μ λ³μ΄ λΏλ§μλλΌ μΈν°μ»€λ₯νΈμ λ³μ΄λ λ°μν΄μΌνλ€. μΈν°μ»€λ₯νΈλ₯Ό ꡬμ±νλ κΈμμ΄ 10μΈ΅ μ΄μ μ¬μ©λκ³ μκ³ , κ° μΈ΅μ ꡬμ±νλ κΈμμ μ νκ³Ό μΊν¨μν΄μ€μ λΉμ μ νμ΄ λͺ¨λ νλ‘ μ§μ° μκ°μ μν₯μ μ£ΌκΈ° λλ¬Έμ λν νλ‘λ₯Ό μ°Ύλ λ¬Έμ λ μ°¨μμ΄ λ§€μ° λμ μμμμ μ΅μ μ ν΄λ₯Ό μ°Ύλ λ°©λ²μ΄ νμνλ€. μ΄λ₯Ό μν΄ μΈν°μ»€λ₯νΈλ₯Ό μ μνλ 곡μ (λ°± μλ μ€λΈ λΌμΈ)μ λ³μ΄λ₯Ό λ°μν λν νλ‘λ₯Ό μμ±νλ λ°©λ²μ μ μνμλ€. 곡μ λ³μ΄κ° μμλ κ°μ₯ λλ¦° νμ΄λ° κ²½λ‘μ μ¬μ©λ κ²μ΄νΈμ λΌμ°ν
ν¨ν΄μ λ³κ²½νλ©΄μ μ μ§μ μΌλ‘ νμνλ λ°©λ²μ΄λ€. ꡬ체μ μΌλ‘, λ³Έ λ
Όλ¬Έμμ μ μνλ ν©μ± νλ μμν¬λ λ€μμ μλ‘μ΄ κΈ°μ λ€μ ν΅ν©νμλ€: (1) λΌμ°ν
μ ꡬμ±νλ μ¬λ¬ κΈμ μΈ΅κ³Ό λΉμλ₯Ό μΆμΆνκ³ νμ μκ° κ°μλ₯Ό μν΄ μ μ¬ν ꡬμ±λ€μ κ°μ λ²μ£Όλ‘ λΆλ₯νμλ€. (2) λΉ λ₯΄κ³ μ νν νμ΄λ° λΆμμ μνμ¬ μ¬λ¬ κΈμ μΈ΅κ³Ό λΉμλ€μ λ³μ΄λ₯Ό μμννμλ€. (3) νμ₯μ±μ κ³ λ €νμ¬ μΌλ°μ μΈ λ§ μ€μ€λ μ΄ν°λ‘ λννλ‘λ₯Ό νμνμλ€.
λλ²μ§Έλ‘ λμμΈ λ£°μ 볡μ‘λκ° μ¦κ°νκ³ μκ³ , μ΄λ‘ μΈν΄ νμ€ μ
λ€μ μΈν°μ»€λ₯νΈλ₯Ό ν΅ν μ°κ²°μ μ§ννλ λμ λμμΈ λ£° μλ°μ΄ μ¦κ°νκ³ μλ€. κ²λ€κ° νμ€ μ
μ ν¬κΈ°κ° κ³μ μμμ§λ©΄μ μ
λ€μ μ°κ²°μ μ μ μ΄λ €μμ§κ³ μλ€. κΈ°μ‘΄μλ νλ‘ λ΄ λͺ¨λ νμ€ μ
μ μ°κ²°νλλ° νμν νΈλ μ, κ°λ₯ν νΈλ μ, μ΄λ€ κ°μ μ°¨μ΄λ₯Ό μ΄μ©νμ¬ μ°κ²° κ°λ₯μ±μ νλ¨νκ³ , λμμΈ λ£° μλ°μ΄ λ°μνμ§ μλλ‘ μ
λ°°μΉλ₯Ό μ΅μ ννμλ€. κ·Έλ¬λ κΈ°μ‘΄ λ°©λ²μ μ΅μ 곡μ μμλ μ ννμ§ μκΈ° λλ¬Έμ λ λ§μ μ 보λ₯Ό μ΄μ©ν νλ‘λ΄ λͺ¨λ νμ€ μ
μ¬μ΄μ μ°κ²° κ°λ₯μ±μ μμΈ‘νλ λ°©λ²μ΄ νμνλ€. λ³Έ λ
Όλ¬Έμμλ κΈ°κ³ νμ΅μ ν΅ν΄ λμμΈ λ£° μλ°μ΄ λ°μνλ μμ λ° κ°μλ₯Ό μμΈ‘νκ³ μ΄λ₯Ό μ€μ΄κΈ° μν΄ νμ€ μ
μ λ°°μΉλ₯Ό λ°κΎΈλ λ°©λ²μ μ μνμλ€. λμμΈ λ£° μλ° μμμ μ΄μ§ λΆλ₯λ‘ μμΈ‘νμκ³ νμ€ μ
μ λ°°μΉλ λμμΈ λ£° μλ° κ°μλ₯Ό μ΅μννλ λ°©ν₯μΌλ‘ μ΅μ νλ₯Ό μννμλ€. μ μνλ νλ μμν¬λ λ€μμ μΈκ°μ§ κΈ°μ λ‘ κ΅¬μ±λμλ€: (1) νλ‘ λ μ΄μμμ μ¬λ¬ κ°μ μ μ¬κ°ν 격μλ‘ λλκ³ κ° κ²©μμμ λΌμ°ν
μ μμΈ‘ν μ μλ μμλ€μ μΆμΆνλ€. (2) κ° κ²©μμμ λμμΈ λ£° μλ°μ΄ μλμ§ μ¬λΆλ₯Ό νλ¨νλ μ΄μ§ λΆλ₯λ₯Ό μννλ€. (3) λ©νν΄λ¦¬μ€ν± μ΅μ ν λλ λ² μ΄μ§μ μ΅μ νλ₯Ό μ΄μ©νμ¬ μ 체 λμμΈ λ£° μλ° κ°μκ° κ°μνλλ‘ κ° κ²©μμ μλ νμ€ μ
μ μμ§μΈλ€.Timing analysis and clearing design rule violations are the essential steps for taping out a chip. However, they keep getting harder in deep sub-micron circuits because the variations of transistors and interconnects have been increasing and design rules have become more complex. This dissertation addresses two problems on timing analysis and design rule violations for synthesizing deep sub-micron circuits.
Firstly, timing analysis in process corners can not capture post-Si performance accurately because the slowest path in the process corner is not always the slowest one in the post-Si instances. In addition, the proportion of interconnect delay in the critical path on a chip is increasing and becomes over 20% in sub-10nm technologies, which means in order to capture post-Si performance accurately, the representative critical
path circuit should reflect not only FEOL (front-end-of-line) but also BEOL (backend-of-line) variations. Since the number of BEOL metal layers exceeds ten and the layers have variation on resistance and capacitance intermixed with resistance variation on vias between them, a very high dimensional design space exploration is necessary to synthesize a representative critical path circuit which is able to provide an accurate performance prediction. To cope with this, I propose a BEOL-aware methodology of synthesizing a representative critical path circuit, which is able to incrementally explore, starting from an initial path circuit on the post-Si target circuit, routing patterns (i.e., BEOL reconfiguring) as well as gate resizing on the path circuit. Precisely, the
synthesis framework of critical path circuit integrates a set of novel techniques: (1) extracting and classifying BEOL configurations for lightening design space complexity, (2) formulating BEOL random variables for fast and accurate timing analysis, and (3) exploring alternative (ring oscillator) circuit structures for extending the applicability of this work.
Secondly, the complexity of design rules has been increasing and results in more design rule violations during routing. In addition, the size of standard cell keeps decreasing and it makes routing harder. In the conventional P&R flow, the routability of pre-routed layout is predicted by routing congestion obtained from global routing, and then placement is optimized not to cause design rule violations. But it turned out to be inaccurate in advanced technology nodes so that it is necessary to predict routability with more features. I propose a methodology of predicting the hotspots of design rule violations (DRVs) using machine learning with placement related features and the conventional routing congestion, and perturbating placed cells to reduce the number of DRVs. Precisely, the hotspots are predicted by a pre-trained binary classification model and placement perturbation is performed by global optimization methods to minimize the number of DRVs predicted by a pre-trained regression model. To do this, the framework is composed of three techniques: (1) dividing the circuit layout into multiple rectangular grids and extracting features such as pin density, cell density, global routing results (demand, capacity and overflow), and more in the placement phase, (2) predicting if each grid has DRVs using a binary classification model, and (3) perturbating the placed standard cells in the hotspots to minimize the number of DRVs predicted by a regression model.1 Introduction 1
1.1 Representative Critical Path Circuit . . . . . . . . . . . . . . . . . . . 1
1.2 Prediction of Design Rule Violations and Placement Perturbation . . . 5
1.3 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 7
2 Methodology for Synthesizing Representative Critical Path Circuits reflecting BEOL Timing Variation 9
2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Definitions and Overall Flow . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Techniques for BEOL-Aware RCP Generation . . . . . . . . . . . . . 17
2.3.1 Clustering BEOL Configurations . . . . . . . . . . . . . . . . 17
2.3.2 Formulating Statistical BEOL Random Variables . . . . . . . 18
2.3.3 Delay Modeling . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Exploring Ring Oscillator Circuit Structures . . . . . . . . . . 24
2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Further Study on Variations . . . . . . . . . . . . . . . . . . . . . . . 37
3 Methodology for Reducing Routing Failures through Enhanced Prediction on Design Rule Violations in Placement 39
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Techniques for Reducing Routing Failures . . . . . . . . . . . . . . . 43
3.3.1 Binary Classification . . . . . . . . . . . . . . . . . . . . . . 43
3.3.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.4 Placement Perturbation . . . . . . . . . . . . . . . . . . . . . 47
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.1 Experiments Setup . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.2 Hotspot Prediction . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4.4 Placement Perturbation . . . . . . . . . . . . . . . . . . . . . 57
4 Conclusions 61
4.1 Synthesis of Representative Critical Path Circuits reflecting BEOL Timing Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Reduction of Routing Failures through Enhanced Prediction on Design Rule Violations in Placement . . . . . . . . . . . . . . . . . . . . . . 62
Abstract (In Korean) 69Docto
Recommended from our members
Cross-Layer Pathfinding for Off-Chip Interconnects
Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects
Robust and Traffic Aware Medium Access Control Mechanisms for Energy-Efficient mm-Wave Wireless Network-on-Chip Architectures
To cater to the performance/watt needs, processors with multiple processing cores on the same chip have become the de-facto design choice. In such multicore systems, Network-on-Chip (NoC) serves as a communication infrastructure for data transfer among the cores on the chip. However, conventional metallic interconnect based NoCs are constrained by their long multi-hop latencies and high power consumption, limiting the performance gain in these systems. Among, different alternatives, due to the CMOS compatibility and energy-efficiency, low-latency wireless interconnect operating in the millimeter wave (mm-wave) band is nearer term solution to this multi-hop communication problem. This has led to the recent exploration of millimeter-wave (mm-wave) wireless technologies in wireless NoC architectures (WiNoC).
To realize the mm-wave wireless interconnect in a WiNoC, a wireless interface (WI) equipped with on-chip antenna and transceiver circuit operating at 60GHz frequency range is integrated to the ports of some NoC switches. The WIs are also equipped with a medium access control (MAC) mechanism that ensures a collision free and energy-efficient communication among the WIs located at different parts on the chip. However, due to shrinking feature size and complex integration in CMOS technology, high-density chips like multicore systems are prone to manufacturing defects and dynamic faults during chip operation. Such failures can result in permanently broken wireless links or cause the MAC to malfunction in a WiNoC. Consequently, the energy-efficient communication through the wireless medium will be compromised. Furthermore, the energy efficiency in the wireless channel access is also dependent on the traffic pattern of the applications running on the multicore systems. Due to the bursty and self-similar nature of the NoC traffic patterns, the traffic demand of the WIs can vary both spatially and temporally. Ineffective management of such traffic variation of the WIs, limits the performance and energy benefits of the novel mm-wave interconnect technology. Hence, to utilize the full potential of the novel mm-wave interconnect technology in WiNoCs, design of a simple, fair, robust, and efficient MAC is of paramount importance.
The main goal of this dissertation is to propose the design principles for robust and traffic-aware MAC mechanisms to provide high bandwidth, low latency, and energy-efficient data communication in mm-wave WiNoCs. The proposed solution has two parts. In the first part, we propose the cross-layer design methodology of robust WiNoC architecture that can minimize the effect of permanent failure of the wireless links and recover from transient failures caused by single event upsets (SEU). Then, in the second part, we present a traffic-aware MAC mechanism that can adjust the transmission slots of the WIs based on the traffic demand of the WIs. The proposed MAC is also robust against the failure of the wireless access mechanism. Finally, as future research directions, this idea of traffic awareness is extended throughout the whole NoC by enabling adaptiveness in both wired and wireless interconnection fabric
SYSTEM-ON-A-CHIP (SOC)-BASED HARDWARE ACCELERATION FOR HUMAN ACTION RECOGNITION WITH CORE COMPONENTS
Today, the implementation of machine vision algorithms on embedded platforms or in portable systems is growing rapidly due to the demand for machine vision in daily human life. Among the applications of machine vision, human action and activity recognition has become an active research area, and market demand for providing integrated smart security systems is growing rapidly. Among the available approaches, embedded vision is in the top tier; however, current embedded platforms may not be able to fully exploit the potential performance of machine vision algorithms, especially in terms of low power consumption. Complex algorithms can impose immense computation and communication demands, especially action recognition algorithms, which require various stages of preprocessing, processing and machine learning blocks that need to operate concurrently. The market demands embedded platforms that operate with a power consumption of only a few watts. Attempts have been mad to improve the performance of traditional embedded approaches by adding more powerful processors; this solution may solve the computation problem but increases the power consumption. System-on-a-chip eld-programmable gate arrays (SoC-FPGAs) have emerged as a major architecture approach for improving power eciency while increasing computational performance. In a SoC-FPGA, an embedded processor and an FPGA serving as an accelerator are fabricated in the same die to simultaneously improve power consumption and performance. Still, current SoC-FPGA-based vision implementations either shy away from supporting complex and adaptive vision algorithms or operate at very limited resolutions due to the immense communication and computation demands. The aim of this research is to develop a SoC-based hardware acceleration workflow for the realization of advanced vision algorithms. Hardware acceleration can improve performance for highly complex mathematical calculations or repeated functions. The performance of a SoC system can thus be improved by using hardware acceleration method to accelerate the element that incurs the highest performance overhead. The outcome of this research could be used for the implementation of various vision algorithms, such as face recognition, object detection or object tracking, on embedded platforms. The contributions of SoC-based hardware acceleration for hardware-software codesign platforms include the following: (1) development of frameworks for complex human action recognition in both 2D and 3D; (2) realization of a framework with four main implemented IPs, namely, foreground and background subtraction (foreground probability), human detection, 2D/3D point-of-interest detection and feature extraction, and OS-ELM as a machine learning algorithm for action identication; (3) use of an FPGA-based hardware acceleration method to resolve system bottlenecks and improve system performance; and (4) measurement and analysis of system specications, such as the acceleration factor, power consumption, and resource utilization. Experimental results show that the proposed SoC-based hardware acceleration approach provides better performance in terms of the acceleration factor, resource utilization and power consumption among all recent works. In addition, a comparison of the accuracy of the framework that runs on the proposed embedded platform (SoCFPGA) with the accuracy of other PC-based frameworks shows that the proposed approach outperforms most other approaches
An Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore Platform
Continuous improvement in silicon process technologies has made possible the integration of hundreds of cores on a single chip. However, power and heat have become dominant constraints in designing these massive multicore chips causing issues with reliability, timing variations and reduced lifetime of the chips. Dynamic Thermal Management (DTM) is a solution to avoid high temperatures on the die. Typical DTM schemes only address core level thermal issues. However, the Network-on-chip (NoC) paradigm, which has emerged as an enabling methodology for integrating hundreds to thousands of cores on the same die can contribute significantly to the thermal issues. Moreover, the typical DTM is triggered reactively based on temperature measurements from on-chip thermal sensor requiring long reaction times whereas predictive DTM method estimates future temperature in advance, eliminating the chance of temperature overshoot. Artificial Neural Networks (ANNs) have been used in various domains for modeling and prediction with high accuracy due to its ability to learn and adapt. This thesis concentrates on designing an ANN prediction engine to predict the thermal profile of the cores and Network-on-Chip elements of the chip. This thermal profile of the chip is then used by the predictive DTM that combines both core level and network level DTM techniques. On-chip wireless interconnect which is recently envisioned to enable energy-efficient data exchange between cores in a multicore environment, will be used to provide a broadcast-capable medium to efficiently distribute thermal control messages to trigger and manage the DTM schemes
- β¦