39 research outputs found

    Efficient Connection Allocator in Network-on-Chip

    Get PDF
    As semiconductor technologies develop, a System-on-Chip (SoC) that integrates all semiconductor intellectual property (IP) cores is suggested and widely used for various applications. A traditional bus interconnection does not support transmitting data between IP cores for high performance. Because of this reason, a Network-on-Chip (NoC) has been suggested to provide an efficient and scalable solution to interconnect among all IP cores. High throughput and low latency have recently become the main important factors of NoC for achieving hard guaranteed real-time systems. In order to guarantee these factors and provide real-time service (i.e., Guaranteed Service, GS), the circuit switching (CS) approach has been widely utilized. The CS approach allocates mutually exclusive paths to transmitting data between different sources and destinations using dedicated NoC resources. However, the exclusive occupancy of the allocated path reduces the efficiency of the overall use of NoC resources. In order to solve this problem, Space-Division-Multiplexing (SDM) and Time-Division-Multiplexing (TDM) techniques have been suggested. SDM implements a circuit switching technique by assigning physically different NoC-links between different connections. Path connections of the SDM technique based on spatial resources assignment do not provide high scalability. In contrast to this, using virtual time slots for a path connection, the TDM technique can share physical links between exclusively established connections, thereby improving NoC path diversity. For all of these mentioned techniques, the factor that significantly impacts the system efficiency or performance scaling is how the path is allocated. In recent years, a dynamic connection allocation approach that can cope with highly dynamic workloads has been gaining attention due to the sudden and diverse demands of applications in real-time systems. There are two groups in the dynamic connection allocation approach. One is a distributed allocation technique, and the other is a centralized allocation technique. While distributed allocation exploits additional logic integrated into the NoC-routers for path search and allocation, the centralized approach makes use of a central unit to manage the path allocation problem. There are several algorithms for the centralized allocation technique. Trellis search-based allocation approach shows the best performance among them. Many algorithms related to centralized connection allocators have been studied extensively during the past decade. However, relatively little attention was paid to methodology in analyzing and evaluating the centralized connection allocation algorithms. In order to further develop the algorithms, it is necessary to understand and evaluate the centralized connection allocator by establishing a new analysis methodology. Thus, this thesis presents a performance analysis methodology for the trellis search-based allocation approach. Firstly, this thesis proposes a system model for analysis. Secondly, performance metrics are defined. Finally, the analysis results of each performance metric related to the trellis search-based allocation approach are presented. Through this analysis, the performance of the trellis search-based allocation approach can be accurately analyzed. Although a simulation is not performed, the upper limit of performance of the trellis search-based allocation approach can also be predicted through the analysis metrics. Additionally, we introduce the general formulation of the trellis search-based path allocation algorithm. The weight values among available paths through the branch metric and path metric are proposed to enable higher performance path connection. Furthermore, according to network size, topology, TDM, interface load delivery, and router internal storage, the performance of trellis search-based path allocation algorithms is also described. In the end, the Application Specific Instruction Processor (ASIP) hardware platform customized for the trellis search-based path allocation algorithm is presented. The shortest available and lowest-cost (SALC) path search algorithm is proposed to improve the success rate of path connection in the ASIP hardware platform. We evaluate the algorithm performance and implementation synthesis results. In order to realize the dynamic connection approach, a short execution cycle of ASIP time is essential. We develop several algorithms to achieve this short execution cycle. The first one is a rectangular region of search algorithm that allows adapting the size and form of path search region according to the particular source-destination positions and considers actual operational constraints. The average execution cycles for searching an optimum path are decreased because the unnecessary region for path-search is excluded. The second one is a path-spreading search algorithm that separates between involved routers and uninvolved routers in path search. The involved routers are selected and spread out from source to destination at each intermediate trellis-search process. The path-search overhead is considerably reduced due to the router involvements. The third one is a three-directional path-spreading search algorithm that eliminates one direction movement among four spreading movements. Because of this reason, the trellis search-based path connection algorithm, which omits the back-tracing process, can be implemented in the ASIP platform. Thus, the whole algorithm execution time can be halved. The last one is a moving regional path search algorithm that significantly reduces computation complexity by selecting a constant dimensional path-search region that affects performance and moving the region from source to destination. The moving regional path search algorithm achieves a considerable decrement of computational complexity.:1 Introduction 1 1.1 NoC-interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Connection allocation in a Network-on-Chip 7 2.1 Circuit Switching NoCs . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Guaranteed Service in NoCs . . . . . . . . . . . . . . . . . . . 7 2.1.2 Spatial-Division-Multiplexing technique . . . . . . . . . . . . 8 2.1.3 Time-Division-Multiplexing technique . . . . . . . . . . . . . 10 2.2 System architectures employing circuit switching NoCs . . . . . . . . 11 2.2.1 Static and dynamic connection allocation . . . . . . . . . . . 12 2.2.2 Distributed connection allocation technique . . . . . . . . . . 14 2.2.3 Centralized connection allocation technique . . . . . . . . . . 16 2.2.4 Algorithms for centralized connection allocation . . . . . . . . 17 2.2.4.1 Software based run-time path allocation approach . 18 2.2.4.2 Trellis search-based allocation approach . . . . . . . 19 3 Performance analysis methodology for a centralized connection allocator 23 3.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Performance metrics and analysis methodology . . . . . . . . . . . . 25 3.3 System simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Trellis search-based path allocation algorithm 45 4.1 General formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.1 Trellis graph structure . . . . . . . . . . . . . . . . . . . . . . 45 4.1.2 Survivor path selection criterion . . . . . . . . . . . . . . . . . 52 ix 4.1.2.1 Branch metric and path metric . . . . . . . . . . . . 52 4.1.2.2 The shortest-available and lowest-cost path selection criterion . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Algorithm Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.1 Network topology . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.2 Network size . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.3 Time-Division-Multiplexing . . . . . . . . . . . . . . . . . . . 61 4.2.4 NoC interface load diversity . . . . . . . . . . . . . . . . . . . 63 4.2.5 The internal storage of the router . . . . . . . . . . . . . . . . 66 5 ASIP approach for Trellis search-based connection allocation 73 5.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.1 Trellis search-based ASIP platform architecture . . . . . . . . 74 5.2 Algorithm for improving success rates of path connection . . . . . . . 81 5.2.1 SALC algorithm for Trellis search-based ASIP platform . . . . 81 5.2.2 Performance evaluation of the SALC algorithm . . . . . . . . 88 5.2.2.1 Simulation results . . . . . . . . . . . . . . . . . . . 88 5.2.2.2 Synthesis results . . . . . . . . . . . . . . . . . . . . 91 5.3 Algorithm for reducing path-search time . . . . . . . . . . . . . . . . 93 5.3.1 Rectangular regional path search algorithm . . . . . . . . . . 93 5.3.2 Path-spreading search algorithm . . . . . . . . . . . . . . . . 99 5.3.3 Three directional path-spreading search algorithm . . . . . . 108 5.3.4 Moving regional path search algorithm . . . . . . . . . . . . . 114 5.3.5 Performance evaluation . . . . . . . . . . . . . . . . . . . . . 123 5.3.5.1 Simulation results . . . . . . . . . . . . . . . . . . . 123 5.3.5.2 Synthesis results . . . . . . . . . . . . . . . . . . . . 126 6 Conclusion and Future work 131 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Bibliography 13

    Time-Predictable Communication on a Time-Division Multiplexing Network-on-Chip Multicore

    Get PDF

    Erreichen von Performance in Netzwerken-On-Chip fĂĽr Echtzeitsysteme

    Get PDF
    In many new applications, such as in automatic driving, high performance requirements have reached safety critical real-time systems. Consequently, Networks-on-Chip (NoCs) must efficiently host new sets of highly dynamic workloads e.g., high resolution sensor fusion and data processing, autonomous decision’s making combined with machine learning. The static platform management, as used in current safety critical systems, is no more sufficient to provide the needed level of service. A dynamic platform management could meet the challenge, but it usually suffers from a lack of predictability and the simplicity necessary for certification of safety and real-time properties. In this work, we propose a novel, global and dynamic arbitration for NoCs with real-time QoS requirements. The mechanism decouples the admission control from arbitration in routers thereby simplifying a dynamic adaptation and real-time analysis. Consequently, the proposed solution allows the deployment of a sophisticated contract-based QoS provisioning without introducing complicated and hard to maintain schemes, known from the frequently applied static arbiters. The presented work introduces an overlay network to synchronize transmissions using arbitration units called Resource Managers (RMs), which allows global and work-conserving scheduling. The description of resource allocation strategies is supplemented by protocol design and verification methodology bringing adaptive control to NoC communication in setups with different QoS requirements and traffic classes. For doing that, a formal worst-case timing analysis for the mechanism has been proposed which demonstrates that this solution not only exposes higher performance in simulation but, even more importantly, consistently reaches smaller formally guaranteed worst-case latencies than other strategies for realistic levels of system's utilization. The approach is not limited to a specific network architecture or topology as the mechanism does not require modifications of routers and therefore can be used together with the majority of existing manycore systems. Indeed, the evaluation followed using the generic performance optimized router designs, as well as two systems-on-chip focused on real-time deployments. The results confirmed that the proposed approach proves to exhibit significantly higher average performance in simulation and execution.In vielen neuen sicherheitskritische Anwendungen, wie z.B. dem automatisierten Fahren, werden große Anforderungen an die Leistung von Echtzeitsysteme gestellt. Daher müssen Networks-on-Chip (NoCs) neue, hochdynamische Workloads wie z.B. hochauflösende Sensorfusion und Datenverarbeitung oder autonome Entscheidungsfindung kombiniert mit maschineller Lernen, effizient auf einem System unterbringen. Die Steuerung der zugrunde liegenden NoC-Architektur, muss die Systemsicherheit vor Fehlern, resultierend aus dem dynamischen Verhalten des Systems schützen und gleichzeitig die geforderte Performance bereitstellen. In dieser Arbeit schlagen wir eine neuartige, globale und dynamische Steuerung für NoCs mit Echtzeit QoS Anforderungen vor. Das Schema entkoppelt die Zutrittskontrolle von der Arbitrierung in Routern. Hierdurch wird eine dynamische Anpassung ermöglicht und die Echtzeitanalyse vereinfacht. Der Einsatz einer ausgefeilten vertragsbasierten Ressourcen-Zuweisung wird so ermöglicht, ohne komplexe und schwer wartbare Mechanismen, welche bereits aus dem statischen Plattformmanagement bekannt sind einzuführen. Diese Arbeit stellt ein übergelagertes Netzwerk vor, welches Übertragungen mit Hilfe von Arbitrierungseinheiten, den so genannten Resource Managern (RMs), synchronisiert. Dieses überlagerte Netzwerk ermöglicht eine globale und lasterhaltende Steuerung. Die Beschreibung verschiedener Ressourcenzuweisungstrategien wird ergänzt durch ein Protokolldesign und Methoden zur Verifikation der adaptiven NoC Steuerung mit unterschiedlichen QoS Anforderungen und Verkehrsklassen. Hierfür wird eine formale Worst Case Timing Analyse präsentiert, welche das vorgestellte Verfahren abbildet. Die Resultate bestätitgen, dass die präsentierte Lösung nicht nur eine höhere Performance in der Simulation bietet, sondern auch formal kleinere Worst-Case Latenzen für realistische Systemauslastungen als andere Strategien garantiert. Der vorgestellte Ansatz ist nicht auf eine bestimmte Netzwerkarchitektur oder Topologie beschränkt, da der Mechanismus keine Änderungen an den unterliegenden Routern erfordert und kann daher zusammen mit bestehenden Manycore-Systemen eingesetzt werden. Die Evaluierung erfolgte auf Basis eines leistungsoptimierten Router-Designs sowie zwei auf Echtzeit-Anwendungen fokusierten Platformen. Die Ergebnisse bestätigten, dass der vorgeschlagene Ansatz im Durchschnitt eine deutlich höhere Leistung in der Simulation und Ausführung liefert
    corecore