36 research outputs found

    Embedded dynamic programming networks for networks-on-chip

    Get PDF
    PhD ThesisRelentless technology downscaling and recent technological advancements in three dimensional integrated circuit (3D-IC) provide a promising prospect to realize heterogeneous system-on-chip (SoC) and homogeneous chip multiprocessor (CMP) based on the networks-onchip (NoCs) paradigm with augmented scalability, modularity and performance. In many cases in such systems, scheduling and managing communication resources are the major design and implementation challenges instead of the computing resources. Past research efforts were mainly focused on complex design-time or simple heuristic run-time approaches to deal with the on-chip network resource management with only local or partial information about the network. This could yield poor communication resource utilizations and amortize the benefits of the emerging technologies and design methods. Thus, the provision for efficient run-time resource management in large-scale on-chip systems becomes critical. This thesis proposes a design methodology for a novel run-time resource management infrastructure that can be realized efficiently using a distributed architecture, which closely couples with the distributed NoC infrastructure. The proposed infrastructure exploits the global information and status of the network to optimize and manage the on-chip communication resources at run-time. There are four major contributions in this thesis. First, it presents a novel deadlock detection method that utilizes run-time transitive closure (TC) computation to discover the existence of deadlock-equivalence sets, which imply loops of requests in NoCs. This detection scheme, TC-network, guarantees the discovery of all true-deadlocks without false alarms in contrast to state-of-the-art approximation and heuristic approaches. Second, it investigates the advantages of implementing future on-chip systems using three dimensional (3D) integration and presents the design, fabrication and testing results of a TC-network implemented in a fully stacked three-layer 3D architecture using a through-silicon via (TSV) complementary metal-oxide semiconductor (CMOS) technology. Testing results demonstrate the effectiveness of such a TC-network for deadlock detection with minimal computational delay in a large-scale network. Third, it introduces an adaptive strategy to effectively diffuse heat throughout the three dimensional network-on-chip (3D-NoC) geometry. This strategy employs a dynamic programming technique to select and optimize the direction of data manoeuvre in NoC. It leads to a tool, which is based on the accurate HotSpot thermal model and SystemC cycle accurate model, to simulate the thermal system and evaluate the proposed approach. Fourth, it presents a new dynamic programming-based run-time thermal management (DPRTM) system, including reactive and proactive schemes, to effectively diffuse heat throughout NoC-based CMPs by routing packets through the coolest paths, when the temperature does not exceed chip’s thermal limit. When the thermal limit is exceeded, throttling is employed to mitigate heat in the chip and DPRTM changes its course to avoid throttled paths and to minimize the impact of throttling on chip performance. This thesis enables a new avenue to explore a novel run-time resource management infrastructure for NoCs, in which new methodologies and concepts are proposed to enhance the on-chip networks for future large-scale 3D integration.Iraqi Ministry of Higher Education and Scientific Research (MOHESR)

    Design and Programming Methods for Reconfigurable Multi-Core Architectures using a Network-on-Chip-Centric Approach

    Get PDF
    A current trend in the semiconductor industry is the use of Multi-Processor Systems-on-Chip (MPSoCs) for a wide variety of applications such as image processing, automotive, multimedia, and robotic systems. Most applications gain performance advantages by executing parallel tasks on multiple processors due to the inherent parallelism. Moreover, heterogeneous structures provide high performance/energy efficiency, since application-specific processing elements (PEs) can be exploited. The increasing number of heterogeneous PEs leads to challenging communication requirements. To overcome this challenge, Networks-on-Chip (NoCs) have emerged as scalable on-chip interconnect. Nevertheless, NoCs have to deal with many design parameters such as virtual channels, routing algorithms and buffering techniques to fulfill the system requirements. This thesis highly contributes to the state-of-the-art of FPGA-based MPSoCs and NoCs. In the following, the three major contributions are introduced. As a first major contribution, a novel router concept is presented that efficiently utilizes communication times by performing sequences of arithmetic operations on the data that is transferred. The internal input buffers of the routers are exchanged with processing units that are capable of executing operations. Two different architectures of such processing units are presented. The first architecture provides multiply and accumulate operations which are often used in signal processing applications. The second architecture introduced as Application-Specific Instruction Set Routers (ASIRs) contains a processing unit capable of executing any operation and hence, it is not limited to multiply and accumulate operations. An internal processing core located in ASIRs can be developed in C/C++ using high-level synthesis. The second major contribution comprises application and performance explorations of the novel router concept. Models that approximate the achievable speedup and the end-to-end latency of ASIRs are derived and discussed to show the benefits in terms of performance. Furthermore, two applications using an ASIR-based MPSoC are implemented and evaluated on a Xilinx Zynq SoC. The first application is an image processing algorithm consisting of a Sobel filter, an RGB-to-Grayscale conversion, and a threshold operation. The second application is a system that helps visually impaired people by navigating them through unknown indoor environments. A Light Detection and Ranging (LIDAR) sensor scans the environment, while Inertial Measurement Units (IMUs) measure the orientation of the user to generate an audio signal that makes the distance as well as the orientation of obstacles audible. This application consists of multiple parallel tasks that are mapped to an ASIR-based MPSoC. Both applications show the performance advantages of ASIRs compared to a conventional NoC-based MPSoC. Furthermore, dynamic partial reconfiguration in terms of relocation and security aspects are investigated. The third major contribution refers to development and programming methodologies of NoC-based MPSoCs. A software-defined approach is presented that combines the design and programming of heterogeneous MPSoCs. In addition, a Kahn-Process-Network (KPN) –based model is designed to describe parallel applications for MPSoCs using ASIRs. The KPN-based model is extended to support not only the mapping of tasks to NoC-based MPSoCs but also the mapping to ASIR-based MPSoCs. A static mapping methodology is presented that assigns tasks to ASIRs and processors for a given KPN-model. The impact of external hardware components such as sensors, actuators and accelerators connected to the processors is also discussed which makes the approach of high interest for embedded systems

    Vorhersagbares und zur Laufzeit adaptierbares On-Chip Netzwerk fĂĽr gemischt kritische Echtzeitsysteme

    Get PDF
    The industry of safety-critical and dependable embedded systems calls for even cheaper, high performance platforms that allow flexibility and an efficient verification of safety and real-time requirements. To cope with the increasing complexity of interconnected functions and to reduce the cost and power consumption of the system, multicore systems are used to efficiently integrate different processing units in the same chip. Networks-on-chip (NoCs), as a modular interconnect, are used as a promising solution for such multiprocessor systems on chip (MPSoCs), due to their scalability and performance. For safety-critical systems, a major goal is the avoidance of hazards. For this, safety-critical systems are qualified or even certified to prove the correctness of the functioning under all possible cases. A predictable behaviour of the NoC can help to ease the qualification process of the system. To achieve the required predictability, designers have two classes of solutions: quality of service mechanisms and (formal) analysis. For mixed-criticality systems, isolation and analysis approaches must be combined to efficiently achieve the desired predictability. Traditional NoC analysis and architecture concepts tackle only a subpart of the challenges: they focus on either performance or predictability. Existing, predictable NoCs are deemed too expensive and inflexible to host a variety of applications with opposing constraints. And state-of-the-art analyses neglect certain platform properties to verify the behaviour. Together this leads to a high over-provisioning of the hardware resources as well as adverse impacts on system performance, and on the flexibility of the system. In this work we tackle these challenges and develop a predictable and runtime-adaptable NoC architecture that efficiently integrates mixed-critical applications with opposing constraints. Additionally, we present a modelling and analysis framework for NoCs that accounts for backpressure. This framework enables to evaluate the performance and reliability early at design time. Hence, the designer can assess multiple design decisions by using abstract models and formal approaches.Die Industrie der sicherheitskritischen und zuverlässigen eingebetteten Systeme verlangt nach noch günstigeren, leistungsfähigeren Plattformen, welche Flexibilität und eine effiziente Überprüfung der Sicherheits- und Echtzeitanforderungen ermöglichen. Um der zunehmenden Komplexität der zunehmend vernetzten Funktionen gerecht zu werden und die Kosten und den Stromverbrauch eines Systems zu reduzieren, werden Mehrkern-Systeme eingesetzt. On-Chip Netzwerke werden aufgrund ihrer Skalierbarkeit und Leistung als vielversprechende Lösung für solch Mehrkern-Systeme eingesetzt. Bei sicherheitskritischen Systemen ist die Vermeidung von Gefahren ein wesentliches Ziel. Dazu werden sicherheitskritische Systeme qualifiziert oder zertifiziert, um die Funktionsfähigkeit in allen möglichen Fällen nachzuweisen. Ein vorhersehbares Verhalten des on-Chip Netzwerks kann dabei helfen, den Qualifizierungsprozess des Systems zu erleichtern. Um die erforderliche Vorhersagbarkeit zu erreichen, gibt es zwei Klassen von Lösungen: Quality of Service Mechanismen und (formale) Analyse. Für Systeme mit gemischter Relevanz müssen Isolationsmechanismen und Analyseansätze kombiniert werden, um die gewünschte Vorhersagbarkeit effizient zu erreichen. Traditionelle Analyse- und Architekturkonzepte für on-Chip Netzwerke lösen nur einen Teil dieser Herausforderungen: sie konzentrieren sich entweder auf Leistung oder Vorhersagbarkeit. Existierende vorhersagbare on-Chip Netzwerke werden als zu teuer und unflexibel erachtet, um eine Vielzahl von Anwendungen mit gegensätzlichen Anforderungen zu integrieren. Und state-of-the-art Analysen vernachlässigen bzw. vereinfachen bestimmte Plattformeigenschaften, um das Verhalten überprüfen zu können. Dies führt zu einer hohen Überbereitstellung der Hardware-Ressourcen als auch zu negativen Auswirkungen auf die Systemleistung und auf die Flexibilität des Systems. In dieser Arbeit gehen wir auf diese Herausforderungen ein und entwickeln eine vorhersehbare und zur Laufzeit anpassbare Architektur für on-Chip Netzwerke, welche gemischt-kritische Anwendungen effizient integriert. Zusätzlich stellen wir ein Modellierungs- und Analyseframework für on-Chip Netzwerke vor, das den Paketrückstau berücksichtigt. Dieses Framework ermöglicht es, Designentscheidungen anhand abstrakter Modelle und formaler Ansätze frühzeitig beurteilen

    On Energy Efficient Computing Platforms

    Get PDF
    In accordance with the Moore's law, the increasing number of on-chip integrated transistors has enabled modern computing platforms with not only higher processing power but also more affordable prices. As a result, these platforms, including portable devices, work stations and data centres, are becoming an inevitable part of the human society. However, with the demand for portability and raising cost of power, energy efficiency has emerged to be a major concern for modern computing platforms. As the complexity of on-chip systems increases, Network-on-Chip (NoC) has been proved as an efficient communication architecture which can further improve system performances and scalability while reducing the design cost. Therefore, in this thesis, we study and propose energy optimization approaches based on NoC architecture, with special focuses on the following aspects. As the architectural trend of future computing platforms, 3D systems have many bene ts including higher integration density, smaller footprint, heterogeneous integration, etc. Moreover, 3D technology can signi cantly improve the network communication and effectively avoid long wirings, and therefore, provide higher system performance and energy efficiency. With the dynamic nature of on-chip communication in large scale NoC based systems, run-time system optimization is of crucial importance in order to achieve higher system reliability and essentially energy efficiency. In this thesis, we propose an agent based system design approach where agents are on-chip components which monitor and control system parameters such as supply voltage, operating frequency, etc. With this approach, we have analysed the implementation alternatives for dynamic voltage and frequency scaling and power gating techniques at different granularity, which reduce both dynamic and leakage energy consumption. Topologies, being one of the key factors for NoCs, are also explored for energy saving purpose. A Honeycomb NoC architecture is proposed in this thesis with turn-model based deadlock-free routing algorithms. Our analysis and simulation based evaluation show that Honeycomb NoCs outperform their Mesh based counterparts in terms of network cost, system performance as well as energy efficiency.Siirretty Doriast

    Physical parameter-aware Networks-on-Chip design

    Get PDF
    PhD ThesisNetworks-on-Chip (NoCs) have been proposed as a scalable, reliable and power-efficient communication fabric for chip multiprocessors (CMPs) and multiprocessor systems-on-chip (MPSoCs). NoCs determine both the performance and the reliability of such systems, with a significant power demand that is expected to increase due to developments in both technology and architecture. In terms of architecture, an important trend in many-core systems architecture is to increase the number of cores on a chip while reducing their individual complexity. This trend increases communication power relative to computation power. Moreover, technology-wise, power-hungry wires are dominating logic as power consumers as technology scales down. For these reasons, the design of future very large scale integration (VLSI) systems is moving from being computation-centric to communication-centric. On the other hand, chip’s physical parameters integrity, especially power and thermal integrity, is crucial for reliable VLSI systems. However, guaranteeing this integrity is becoming increasingly difficult with the higher scale of integration due to increased power density and operating frequencies that result in continuously increasing temperature and voltage drops in the chip. This is a challenge that may prevent further shrinking of devices. Thus, tackling the challenge of power and thermal integrity of future many-core systems at only one level of abstraction, the chip and package design for example, is no longer sufficient to ensure the integrity of physical parameters. New designtime and run-time strategies may need to work together at different levels of abstraction, such as package, application, network, to provide the required physical parameter integrity for these large systems. This necessitates strategies that work at the level of the on-chip network with its rising power budget. This thesis proposes models, techniques and architectures to improve power and thermal integrity of Network-on-Chip (NoC)-based many-core systems. The thesis is composed of two major parts: i) minimization and modelling of power supply variations to improve power integrity; and ii) dynamic thermal adaptation to improve thermal integrity. This thesis makes four major contributions. The first is a computational model of on-chip power supply variations in NoCs. The proposed model embeds a power delivery model, an NoC activity simulator and a power model. The model is verified with SPICE simulation and employed to analyse power supply variations in synthetic and real NoC workloads. Novel observations regarding power supply noise correlation with different traffic patterns and routing algorithms are found. The second is a new application mapping strategy aiming vii to minimize power supply noise in NoCs. This is achieved by defining a new metric, switching activity density, and employing a force-based objective function that results in minimizing switching density. Significant reductions in power supply noise (PSN) are achieved with a low energy penalty. This reduction in PSN also results in a better link timing accuracy. The third contribution is a new dynamic thermal-adaptive routing strategy to effectively diffuse heat from the NoC-based threedimensional (3D) CMPs, using a dynamic programming (DP)-based distributed control architecture. Moreover, a new approach for efficient extension of two-dimensional (2D) partially-adaptive routing algorithms to 3D is presented. This approach improves three-dimensional networkon- chip (3D NoC) routing adaptivity while ensuring deadlock-freeness. Finally, the proposed thermal-adaptive routing is implemented in field-programmable gate array (FPGA), and implementation challenges, for both thermal sensing and the dynamic control architecture are addressed. The proposed routing implementation is evaluated in terms of both functionality and performance. The methodologies and architectures proposed in this thesis open a new direction for improving the power and thermal integrity of future NoC-based 2D and 3D many-core architectures

    Erreichen von Performance in Netzwerken-On-Chip fĂĽr Echtzeitsysteme

    Get PDF
    In many new applications, such as in automatic driving, high performance requirements have reached safety critical real-time systems. Consequently, Networks-on-Chip (NoCs) must efficiently host new sets of highly dynamic workloads e.g., high resolution sensor fusion and data processing, autonomous decision’s making combined with machine learning. The static platform management, as used in current safety critical systems, is no more sufficient to provide the needed level of service. A dynamic platform management could meet the challenge, but it usually suffers from a lack of predictability and the simplicity necessary for certification of safety and real-time properties. In this work, we propose a novel, global and dynamic arbitration for NoCs with real-time QoS requirements. The mechanism decouples the admission control from arbitration in routers thereby simplifying a dynamic adaptation and real-time analysis. Consequently, the proposed solution allows the deployment of a sophisticated contract-based QoS provisioning without introducing complicated and hard to maintain schemes, known from the frequently applied static arbiters. The presented work introduces an overlay network to synchronize transmissions using arbitration units called Resource Managers (RMs), which allows global and work-conserving scheduling. The description of resource allocation strategies is supplemented by protocol design and verification methodology bringing adaptive control to NoC communication in setups with different QoS requirements and traffic classes. For doing that, a formal worst-case timing analysis for the mechanism has been proposed which demonstrates that this solution not only exposes higher performance in simulation but, even more importantly, consistently reaches smaller formally guaranteed worst-case latencies than other strategies for realistic levels of system's utilization. The approach is not limited to a specific network architecture or topology as the mechanism does not require modifications of routers and therefore can be used together with the majority of existing manycore systems. Indeed, the evaluation followed using the generic performance optimized router designs, as well as two systems-on-chip focused on real-time deployments. The results confirmed that the proposed approach proves to exhibit significantly higher average performance in simulation and execution.In vielen neuen sicherheitskritische Anwendungen, wie z.B. dem automatisierten Fahren, werden große Anforderungen an die Leistung von Echtzeitsysteme gestellt. Daher müssen Networks-on-Chip (NoCs) neue, hochdynamische Workloads wie z.B. hochauflösende Sensorfusion und Datenverarbeitung oder autonome Entscheidungsfindung kombiniert mit maschineller Lernen, effizient auf einem System unterbringen. Die Steuerung der zugrunde liegenden NoC-Architektur, muss die Systemsicherheit vor Fehlern, resultierend aus dem dynamischen Verhalten des Systems schützen und gleichzeitig die geforderte Performance bereitstellen. In dieser Arbeit schlagen wir eine neuartige, globale und dynamische Steuerung für NoCs mit Echtzeit QoS Anforderungen vor. Das Schema entkoppelt die Zutrittskontrolle von der Arbitrierung in Routern. Hierdurch wird eine dynamische Anpassung ermöglicht und die Echtzeitanalyse vereinfacht. Der Einsatz einer ausgefeilten vertragsbasierten Ressourcen-Zuweisung wird so ermöglicht, ohne komplexe und schwer wartbare Mechanismen, welche bereits aus dem statischen Plattformmanagement bekannt sind einzuführen. Diese Arbeit stellt ein übergelagertes Netzwerk vor, welches Übertragungen mit Hilfe von Arbitrierungseinheiten, den so genannten Resource Managern (RMs), synchronisiert. Dieses überlagerte Netzwerk ermöglicht eine globale und lasterhaltende Steuerung. Die Beschreibung verschiedener Ressourcenzuweisungstrategien wird ergänzt durch ein Protokolldesign und Methoden zur Verifikation der adaptiven NoC Steuerung mit unterschiedlichen QoS Anforderungen und Verkehrsklassen. Hierfür wird eine formale Worst Case Timing Analyse präsentiert, welche das vorgestellte Verfahren abbildet. Die Resultate bestätitgen, dass die präsentierte Lösung nicht nur eine höhere Performance in der Simulation bietet, sondern auch formal kleinere Worst-Case Latenzen für realistische Systemauslastungen als andere Strategien garantiert. Der vorgestellte Ansatz ist nicht auf eine bestimmte Netzwerkarchitektur oder Topologie beschränkt, da der Mechanismus keine Änderungen an den unterliegenden Routern erfordert und kann daher zusammen mit bestehenden Manycore-Systemen eingesetzt werden. Die Evaluierung erfolgte auf Basis eines leistungsoptimierten Router-Designs sowie zwei auf Echtzeit-Anwendungen fokusierten Platformen. Die Ergebnisse bestätigten, dass der vorgeschlagene Ansatz im Durchschnitt eine deutlich höhere Leistung in der Simulation und Ausführung liefert

    Software-based and regionally-oriented traffic management in Networks-on-Chip

    Get PDF
    Since the introduction of chip-multiprocessor systems, the number of integrated cores has been steady growing and workload applications have been adapted to exploit the increasing parallelism. This changed the importance of efficient on-chip communication significantly and the infrastructure has to keep step with these new requirements. The work at hand makes significant contributions to the state-of-the-art of the latest generation of such solutions, called Networks-on-Chip, to improve the performance, reliability, and flexible management of these on-chip infrastructures

    Approches d'optimisation et de personnalisation des réseaux sur puce (NoC : Networks on Chip)

    Get PDF
    Systems-on-chip (SoC) have become more and more complex due to the development of integrated circuit technology.Recent studies have shown that in order to improve the performance of a specific SoC application domain, the on-chipinter-connects (OCI) architecture must be customized at design-time or at run-time. Related approaches generallyprovide application-specific SoCs tailored to specific applications. The aim of this thesis is to carry out new approachesfor Network-on-Chip (NoC) and study their performances, especially in terms of latency, throughput, energyconsumption and simplicity of implementation.We have proposed an approach to allow designers to customize a candidate OCI architecture by adding strategiclinks in order to match large application workload. The analytical evaluation focuses on improving the physicalparameters of the NoC topology regardless of the application that should run on. The evaluation by simulationfocuses to evaluate the communication performances of the NoC. Simulations results show the effectiveness ofthis approach to improve the NoC performances. We have also introduced a compartmental Fluid-flow basedmodeling approach to allocate required resource for each buffer based on the application traffic pattern. Simulationsare conducted and results show the efficiency of this modeling method for a buffer space optimized allocation.Finally, we proposed a joint approach based on a system dynamics theory for evaluating the performance of a flowcontrol algorithm in NoCs. This algorithm allows NoC elements to dynamically adjust their inflow by using afeedback control-based mechanism. Analytical and simulation results showed the viability of this mechanism forcongestion avoidance in NoCs.Les systèmes embarqués sur puce (SoC : Systems-on-Chip) sont devenus de plus en plus complexes grâce à l’évolution de la technologie des circuits intégrés. Des études récentes ont montré que pour améliorer les performances du réseau su puce (NoC : Network-on-Chip), l’architecture de celui-ci pouvait être personnalisée, soit au moment de la conception, soit au moment de l’exécution. L’objectif principal de cette thèse est d’implémenter de nouvelles approches pour améliorer les performances des NoCs, notamment la latence, le débit, la consommation d’énergie, et la simplicité de mise en œuvre.Nous avons proposé une approche pour permettre aux concepteurs de personnaliser l'architecture d’un NoC par insertion de liens stratégiques, pour qu’elle soit adaptée à de nombreuses applications, sous la contrainte d’un budget limité en termes de nombre de liens. L’évaluation analytique porte sur l’amélioration des paramètres physiques de la topologie du NoC sans tenir compte de l’application qui devrait s’exécuter dessus. L’évaluation par simulation porte sur l’évaluation des performances de communication du NoC. Les résultats de simulations montrent l’efficacité de notre approche pour améliorer les performances du NoC. Nous avons également introduit une approche de modélisation par réseau à compartiments pour allouer les ressources nécessaires pour chaque tampon selon le modèle de trafic de l'application cible. Les résultats de simulations montrent l'efficacité de cette approche de modélisation pour l’allocation optimisée de l'espace tampon. Enfin, nous avons proposé une approche conjointe basée sur la théorie des systèmes dynamiques pour évaluer la performance d'un algorithme de contrôle de flux dans les NoCs. Cet algorithme permet aux éléments du NoC d’ajuster dynamiquement leur entrée en utilisant un mécanisme basé sur le contrôle de flux par rétroaction. Les résultats d’évaluations analytiques et de simulation montrent la viabilité de ce mécanisme pour éviter la congestion dans les NoCs

    A DRAM Centric NoC Architecture and Topology Design Approach

    Get PDF
    Most communication traffic in today\u2019s System on Chips (SoC) is DRAM centric. The NoC should be designed to efficiently handle the many-to-one communication pattern, funneling to and from the DRAM controller. In this paper, we motivate the use of a separate network for the DRAM traffic and justify the power overhead and performance improvement obtained, when compared to traditional solutions. We also show how the topology of this DRAM network can be designed and optimized to account for the funnel-shaped pattern. Our experiments on a realistic SoC multimedia benchmark shows a large reduction in power consumption and improvement in performance when compared to existing solutions
    corecore