214 research outputs found

    Networks on Chips: Structure and Design Methodologies

    Get PDF

    Efficient Connection Allocator in Network-on-Chip

    Get PDF
    As semiconductor technologies develop, a System-on-Chip (SoC) that integrates all semiconductor intellectual property (IP) cores is suggested and widely used for various applications. A traditional bus interconnection does not support transmitting data between IP cores for high performance. Because of this reason, a Network-on-Chip (NoC) has been suggested to provide an efficient and scalable solution to interconnect among all IP cores. High throughput and low latency have recently become the main important factors of NoC for achieving hard guaranteed real-time systems. In order to guarantee these factors and provide real-time service (i.e., Guaranteed Service, GS), the circuit switching (CS) approach has been widely utilized. The CS approach allocates mutually exclusive paths to transmitting data between different sources and destinations using dedicated NoC resources. However, the exclusive occupancy of the allocated path reduces the efficiency of the overall use of NoC resources. In order to solve this problem, Space-Division-Multiplexing (SDM) and Time-Division-Multiplexing (TDM) techniques have been suggested. SDM implements a circuit switching technique by assigning physically different NoC-links between different connections. Path connections of the SDM technique based on spatial resources assignment do not provide high scalability. In contrast to this, using virtual time slots for a path connection, the TDM technique can share physical links between exclusively established connections, thereby improving NoC path diversity. For all of these mentioned techniques, the factor that significantly impacts the system efficiency or performance scaling is how the path is allocated. In recent years, a dynamic connection allocation approach that can cope with highly dynamic workloads has been gaining attention due to the sudden and diverse demands of applications in real-time systems. There are two groups in the dynamic connection allocation approach. One is a distributed allocation technique, and the other is a centralized allocation technique. While distributed allocation exploits additional logic integrated into the NoC-routers for path search and allocation, the centralized approach makes use of a central unit to manage the path allocation problem. There are several algorithms for the centralized allocation technique. Trellis search-based allocation approach shows the best performance among them. Many algorithms related to centralized connection allocators have been studied extensively during the past decade. However, relatively little attention was paid to methodology in analyzing and evaluating the centralized connection allocation algorithms. In order to further develop the algorithms, it is necessary to understand and evaluate the centralized connection allocator by establishing a new analysis methodology. Thus, this thesis presents a performance analysis methodology for the trellis search-based allocation approach. Firstly, this thesis proposes a system model for analysis. Secondly, performance metrics are defined. Finally, the analysis results of each performance metric related to the trellis search-based allocation approach are presented. Through this analysis, the performance of the trellis search-based allocation approach can be accurately analyzed. Although a simulation is not performed, the upper limit of performance of the trellis search-based allocation approach can also be predicted through the analysis metrics. Additionally, we introduce the general formulation of the trellis search-based path allocation algorithm. The weight values among available paths through the branch metric and path metric are proposed to enable higher performance path connection. Furthermore, according to network size, topology, TDM, interface load delivery, and router internal storage, the performance of trellis search-based path allocation algorithms is also described. In the end, the Application Specific Instruction Processor (ASIP) hardware platform customized for the trellis search-based path allocation algorithm is presented. The shortest available and lowest-cost (SALC) path search algorithm is proposed to improve the success rate of path connection in the ASIP hardware platform. We evaluate the algorithm performance and implementation synthesis results. In order to realize the dynamic connection approach, a short execution cycle of ASIP time is essential. We develop several algorithms to achieve this short execution cycle. The first one is a rectangular region of search algorithm that allows adapting the size and form of path search region according to the particular source-destination positions and considers actual operational constraints. The average execution cycles for searching an optimum path are decreased because the unnecessary region for path-search is excluded. The second one is a path-spreading search algorithm that separates between involved routers and uninvolved routers in path search. The involved routers are selected and spread out from source to destination at each intermediate trellis-search process. The path-search overhead is considerably reduced due to the router involvements. The third one is a three-directional path-spreading search algorithm that eliminates one direction movement among four spreading movements. Because of this reason, the trellis search-based path connection algorithm, which omits the back-tracing process, can be implemented in the ASIP platform. Thus, the whole algorithm execution time can be halved. The last one is a moving regional path search algorithm that significantly reduces computation complexity by selecting a constant dimensional path-search region that affects performance and moving the region from source to destination. The moving regional path search algorithm achieves a considerable decrement of computational complexity.:1 Introduction 1 1.1 NoC-interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Connection allocation in a Network-on-Chip 7 2.1 Circuit Switching NoCs . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Guaranteed Service in NoCs . . . . . . . . . . . . . . . . . . . 7 2.1.2 Spatial-Division-Multiplexing technique . . . . . . . . . . . . 8 2.1.3 Time-Division-Multiplexing technique . . . . . . . . . . . . . 10 2.2 System architectures employing circuit switching NoCs . . . . . . . . 11 2.2.1 Static and dynamic connection allocation . . . . . . . . . . . 12 2.2.2 Distributed connection allocation technique . . . . . . . . . . 14 2.2.3 Centralized connection allocation technique . . . . . . . . . . 16 2.2.4 Algorithms for centralized connection allocation . . . . . . . . 17 2.2.4.1 Software based run-time path allocation approach . 18 2.2.4.2 Trellis search-based allocation approach . . . . . . . 19 3 Performance analysis methodology for a centralized connection allocator 23 3.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Performance metrics and analysis methodology . . . . . . . . . . . . 25 3.3 System simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Trellis search-based path allocation algorithm 45 4.1 General formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.1 Trellis graph structure . . . . . . . . . . . . . . . . . . . . . . 45 4.1.2 Survivor path selection criterion . . . . . . . . . . . . . . . . . 52 ix 4.1.2.1 Branch metric and path metric . . . . . . . . . . . . 52 4.1.2.2 The shortest-available and lowest-cost path selection criterion . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Algorithm Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.1 Network topology . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.2 Network size . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.3 Time-Division-Multiplexing . . . . . . . . . . . . . . . . . . . 61 4.2.4 NoC interface load diversity . . . . . . . . . . . . . . . . . . . 63 4.2.5 The internal storage of the router . . . . . . . . . . . . . . . . 66 5 ASIP approach for Trellis search-based connection allocation 73 5.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.1 Trellis search-based ASIP platform architecture . . . . . . . . 74 5.2 Algorithm for improving success rates of path connection . . . . . . . 81 5.2.1 SALC algorithm for Trellis search-based ASIP platform . . . . 81 5.2.2 Performance evaluation of the SALC algorithm . . . . . . . . 88 5.2.2.1 Simulation results . . . . . . . . . . . . . . . . . . . 88 5.2.2.2 Synthesis results . . . . . . . . . . . . . . . . . . . . 91 5.3 Algorithm for reducing path-search time . . . . . . . . . . . . . . . . 93 5.3.1 Rectangular regional path search algorithm . . . . . . . . . . 93 5.3.2 Path-spreading search algorithm . . . . . . . . . . . . . . . . 99 5.3.3 Three directional path-spreading search algorithm . . . . . . 108 5.3.4 Moving regional path search algorithm . . . . . . . . . . . . . 114 5.3.5 Performance evaluation . . . . . . . . . . . . . . . . . . . . . 123 5.3.5.1 Simulation results . . . . . . . . . . . . . . . . . . . 123 5.3.5.2 Synthesis results . . . . . . . . . . . . . . . . . . . . 126 6 Conclusion and Future work 131 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Bibliography 13

    Time-Predictable Communication on a Time-Division Multiplexing Network-on-Chip Multicore

    Get PDF

    NoC Resource Allocation Based on Physical Design Techniques

    Get PDF
    Networks-on-Chip (NoC) has been recognized as a scalable approach for on-chip communication. Quality-of-Service (QoS) is a fundamental part of application specific NoCs. This thesis focuses on resource allocation on NoC, to improve the capability of NoC for Guaranteed Service (GS). A graph model is adopted to describe physical and temporal sources of a NoC. Based on the graph model, an RRR-based algorithm is proposed for simultaneous routing and time slot allocation. In addition, a negotiation-based algorithm is suggested for achieving power-efficient QoS for application-specific NoCs. Last, a hybrid NoC architecture, which combines circuit switching and packet switching, is developed and investigated. Experimental results show that our techniques outperform previous works

    Automatic synthesis and optimization of chip multiprocessors

    Get PDF
    The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed

    Physical parameter-aware Networks-on-Chip design

    Get PDF
    PhD ThesisNetworks-on-Chip (NoCs) have been proposed as a scalable, reliable and power-efficient communication fabric for chip multiprocessors (CMPs) and multiprocessor systems-on-chip (MPSoCs). NoCs determine both the performance and the reliability of such systems, with a significant power demand that is expected to increase due to developments in both technology and architecture. In terms of architecture, an important trend in many-core systems architecture is to increase the number of cores on a chip while reducing their individual complexity. This trend increases communication power relative to computation power. Moreover, technology-wise, power-hungry wires are dominating logic as power consumers as technology scales down. For these reasons, the design of future very large scale integration (VLSI) systems is moving from being computation-centric to communication-centric. On the other hand, chip’s physical parameters integrity, especially power and thermal integrity, is crucial for reliable VLSI systems. However, guaranteeing this integrity is becoming increasingly difficult with the higher scale of integration due to increased power density and operating frequencies that result in continuously increasing temperature and voltage drops in the chip. This is a challenge that may prevent further shrinking of devices. Thus, tackling the challenge of power and thermal integrity of future many-core systems at only one level of abstraction, the chip and package design for example, is no longer sufficient to ensure the integrity of physical parameters. New designtime and run-time strategies may need to work together at different levels of abstraction, such as package, application, network, to provide the required physical parameter integrity for these large systems. This necessitates strategies that work at the level of the on-chip network with its rising power budget. This thesis proposes models, techniques and architectures to improve power and thermal integrity of Network-on-Chip (NoC)-based many-core systems. The thesis is composed of two major parts: i) minimization and modelling of power supply variations to improve power integrity; and ii) dynamic thermal adaptation to improve thermal integrity. This thesis makes four major contributions. The first is a computational model of on-chip power supply variations in NoCs. The proposed model embeds a power delivery model, an NoC activity simulator and a power model. The model is verified with SPICE simulation and employed to analyse power supply variations in synthetic and real NoC workloads. Novel observations regarding power supply noise correlation with different traffic patterns and routing algorithms are found. The second is a new application mapping strategy aiming vii to minimize power supply noise in NoCs. This is achieved by defining a new metric, switching activity density, and employing a force-based objective function that results in minimizing switching density. Significant reductions in power supply noise (PSN) are achieved with a low energy penalty. This reduction in PSN also results in a better link timing accuracy. The third contribution is a new dynamic thermal-adaptive routing strategy to effectively diffuse heat from the NoC-based threedimensional (3D) CMPs, using a dynamic programming (DP)-based distributed control architecture. Moreover, a new approach for efficient extension of two-dimensional (2D) partially-adaptive routing algorithms to 3D is presented. This approach improves three-dimensional networkon- chip (3D NoC) routing adaptivity while ensuring deadlock-freeness. Finally, the proposed thermal-adaptive routing is implemented in field-programmable gate array (FPGA), and implementation challenges, for both thermal sensing and the dynamic control architecture are addressed. The proposed routing implementation is evaluated in terms of both functionality and performance. The methodologies and architectures proposed in this thesis open a new direction for improving the power and thermal integrity of future NoC-based 2D and 3D many-core architectures

    Design and implementation of NoC routers and their application to Prdt-based NoC\u27s

    Full text link
    With a communication-centric design style, Networks-on-Chips (NoCs) emerges as a new paradigm of Systems-on-Chips (SoCs) to overcome the limitations of bus-based communication infrastructure. An important problem in the design of NoCs is the router design, which has great impact on the cost and performance of a NoC system. This thesis is focused on the design and implementation of an optimized parameterized router which can be applied in mesh/torus-based and Perfect Recursive Diagonal Torus (PRDT)-based NoCs; In specific, the router design includes the design and implementation of two routing algorithms (vector routing and circular coded vector routing), the wormhole switching scheme, the scheduling scheme, buffering strategy, and flow control scheme. Correspondingly, the following components are designed and implemented: input controller, output controller, crossbar switch, and scheduler. Verilog HDL codes are generated and synthesized on ASIC platforms. Most components are designed in parameterized way. Performance evaluation of each component of the router in terms of timing, area, and power consumption is conducted. The efficiency of the two routing algorithms and tradeoff between computational time (tsetup) and area are analyzed; To reduce the area cost of the router design, the two major components, the crossbar switch and the scheduler, are optimized. Particularly, for crossbar switch, a comparative study of two crossbar designs is performed with the aid of Magic Layout editor, Synopsys CosmosSE and Awaves; Based on the router design, the PRDT network composed of 4x4 routers is designed and synthesized on ASIC platforms
    • …
    corecore