36 research outputs found
Embedded dynamic programming networks for networks-on-chip
PhD ThesisRelentless technology downscaling and recent technological advancements
in three dimensional integrated circuit (3D-IC) provide a promising
prospect to realize heterogeneous system-on-chip (SoC) and homogeneous
chip multiprocessor (CMP) based on the networks-onchip
(NoCs) paradigm with augmented scalability, modularity and
performance. In many cases in such systems, scheduling and managing
communication resources are the major design and implementation
challenges instead of the computing resources. Past research
efforts were mainly focused on complex design-time or simple heuristic
run-time approaches to deal with the on-chip network resource
management with only local or partial information about the network.
This could yield poor communication resource utilizations and amortize
the benefits of the emerging technologies and design methods.
Thus, the provision for efficient run-time resource management in
large-scale on-chip systems becomes critical. This thesis proposes a
design methodology for a novel run-time resource management infrastructure
that can be realized efficiently using a distributed architecture,
which closely couples with the distributed NoC infrastructure. The
proposed infrastructure exploits the global information and status
of the network to optimize and manage the on-chip communication
resources at run-time.
There are four major contributions in this thesis. First, it presents a
novel deadlock detection method that utilizes run-time transitive closure
(TC) computation to discover the existence of deadlock-equivalence
sets, which imply loops of requests in NoCs. This detection scheme,
TC-network, guarantees the discovery of all true-deadlocks without
false alarms in contrast to state-of-the-art approximation and heuristic
approaches. Second, it investigates the advantages of implementing
future on-chip systems using three dimensional (3D) integration and
presents the design, fabrication and testing results of a TC-network
implemented in a fully stacked three-layer 3D architecture using a
through-silicon via (TSV) complementary metal-oxide semiconductor
(CMOS) technology. Testing results demonstrate the effectiveness
of such a TC-network for deadlock detection with minimal computational
delay in a large-scale network. Third, it introduces an adaptive
strategy to effectively diffuse heat throughout the three dimensional
network-on-chip (3D-NoC) geometry. This strategy employs a dynamic
programming technique to select and optimize the direction of data
manoeuvre in NoC. It leads to a tool, which is based on the accurate
HotSpot thermal model and SystemC cycle accurate model, to simulate
the thermal system and evaluate the proposed approach. Fourth, it
presents a new dynamic programming-based run-time thermal management
(DPRTM) system, including reactive and proactive schemes, to
effectively diffuse heat throughout NoC-based CMPs by routing packets
through the coolest paths, when the temperature does not exceed
chip’s thermal limit. When the thermal limit is exceeded, throttling is
employed to mitigate heat in the chip and DPRTM changes its course
to avoid throttled paths and to minimize the impact of throttling on
chip performance.
This thesis enables a new avenue to explore a novel run-time resource
management infrastructure for NoCs, in which new methodologies
and concepts are proposed to enhance the on-chip networks for
future large-scale 3D integration.Iraqi Ministry of Higher Education and Scientific Research (MOHESR)
Design and Programming Methods for Reconfigurable Multi-Core Architectures using a Network-on-Chip-Centric Approach
A current trend in the semiconductor industry is the use of Multi-Processor Systems-on-Chip (MPSoCs) for a wide variety of applications such as image processing, automotive, multimedia, and robotic systems. Most applications gain performance advantages by executing parallel tasks on multiple processors due to the inherent parallelism. Moreover, heterogeneous structures provide high performance/energy efficiency, since application-specific processing elements (PEs) can be exploited. The increasing number of heterogeneous PEs leads to challenging communication requirements. To overcome this challenge, Networks-on-Chip (NoCs) have emerged as scalable on-chip interconnect. Nevertheless, NoCs have to deal with many design parameters such as virtual channels, routing algorithms and buffering techniques to fulfill the system requirements.
This thesis highly contributes to the state-of-the-art of FPGA-based MPSoCs and NoCs. In the following, the three major contributions are introduced.
As a first major contribution, a novel router concept is presented that efficiently utilizes communication times by performing sequences of arithmetic operations on the data that is transferred. The internal input buffers of the routers are exchanged with processing units that are capable of executing operations. Two different architectures of such processing units are presented. The first architecture provides multiply and accumulate operations which are often used in signal processing applications. The second architecture introduced as Application-Specific Instruction Set Routers (ASIRs) contains a processing unit capable of executing any operation and hence, it is not limited to multiply and accumulate operations. An internal processing core located in ASIRs can be developed in C/C++ using high-level synthesis.
The second major contribution comprises application and performance explorations of the novel router concept. Models that approximate the achievable speedup and the end-to-end latency of ASIRs are derived and discussed to show the benefits in terms of performance. Furthermore, two applications using an ASIR-based MPSoC are implemented and evaluated on a Xilinx Zynq SoC. The first application is an image processing algorithm consisting of a Sobel filter, an RGB-to-Grayscale conversion, and a threshold operation. The second application is a system that helps visually impaired people by navigating them through unknown indoor environments. A Light Detection and Ranging (LIDAR) sensor scans the environment, while Inertial Measurement Units (IMUs) measure the orientation of the user to generate an audio signal that makes the distance as well as the orientation of obstacles audible. This application consists of multiple parallel tasks that are mapped to an ASIR-based MPSoC. Both applications show the performance advantages of ASIRs compared to a conventional NoC-based MPSoC. Furthermore, dynamic partial reconfiguration in terms of relocation and security aspects are investigated.
The third major contribution refers to development and programming methodologies of NoC-based MPSoCs. A software-defined approach is presented that combines the design and programming of heterogeneous MPSoCs. In addition, a Kahn-Process-Network (KPN) –based model is designed to describe parallel applications for MPSoCs using ASIRs. The KPN-based model is extended to support not only the mapping of tasks to NoC-based MPSoCs but also the mapping to ASIR-based MPSoCs. A static mapping methodology is presented that assigns tasks to ASIRs and processors for a given KPN-model. The impact of external hardware components such as sensors, actuators and accelerators connected to the processors is also discussed which makes the approach of high interest for embedded systems
Vorhersagbares und zur Laufzeit adaptierbares On-Chip Netzwerk fĂĽr gemischt kritische Echtzeitsysteme
The industry of safety-critical and dependable embedded systems calls for even cheaper, high performance platforms that allow flexibility and an efficient verification of safety and real-time requirements. To cope with the increasing complexity of interconnected functions and to reduce the cost and power consumption of the system, multicore systems are used to efficiently integrate different processing units in the same chip. Networks-on-chip (NoCs), as a modular interconnect, are used as a promising solution for such multiprocessor systems on chip (MPSoCs), due to their scalability and performance.
For safety-critical systems, a major goal is the avoidance of hazards. For this, safety-critical systems are qualified or even certified to prove the correctness of the functioning under all possible cases. A predictable behaviour of the NoC can help to ease the qualification process of the system. To achieve the required predictability, designers have two classes of solutions: quality of service mechanisms and (formal) analysis. For mixed-criticality systems, isolation and analysis approaches must be combined to efficiently achieve the desired predictability.
Traditional NoC analysis and architecture concepts tackle only a subpart of the challenges: they focus on either performance or predictability. Existing, predictable NoCs are deemed too expensive and inflexible to host a variety of applications with opposing constraints. And state-of-the-art analyses neglect certain platform properties to verify the behaviour. Together this leads to a high over-provisioning of the hardware resources as well as adverse impacts on system performance, and on the flexibility of the system.
In this work we tackle these challenges and develop a predictable and runtime-adaptable NoC architecture that efficiently integrates mixed-critical applications with opposing constraints. Additionally, we present a modelling and analysis framework for NoCs that accounts for backpressure. This framework enables to evaluate the performance and reliability early at design time. Hence, the designer can assess multiple design decisions by using abstract models and formal approaches.Die Industrie der sicherheitskritischen und zuverlässigen eingebetteten Systeme verlangt nach noch günstigeren, leistungsfähigeren Plattformen, welche Flexibilität und eine effiziente Überprüfung der Sicherheits- und Echtzeitanforderungen ermöglichen. Um der zunehmenden Komplexität der zunehmend vernetzten Funktionen gerecht zu werden und die Kosten und den Stromverbrauch eines Systems zu reduzieren, werden Mehrkern-Systeme eingesetzt. On-Chip Netzwerke werden aufgrund ihrer Skalierbarkeit und Leistung als vielversprechende Lösung für solch Mehrkern-Systeme eingesetzt.
Bei sicherheitskritischen Systemen ist die Vermeidung von Gefahren ein wesentliches Ziel. Dazu werden sicherheitskritische Systeme qualifiziert oder zertifiziert, um die Funktionsfähigkeit in allen möglichen Fällen nachzuweisen. Ein vorhersehbares Verhalten des on-Chip Netzwerks kann dabei helfen, den Qualifizierungsprozess des Systems zu erleichtern. Um die erforderliche Vorhersagbarkeit zu erreichen, gibt es zwei Klassen von Lösungen: Quality of Service Mechanismen und (formale) Analyse. Für Systeme mit gemischter Relevanz müssen Isolationsmechanismen und Analyseansätze kombiniert werden, um die gewünschte Vorhersagbarkeit effizient zu erreichen.
Traditionelle Analyse- und Architekturkonzepte für on-Chip Netzwerke lösen nur einen Teil dieser Herausforderungen: sie konzentrieren sich entweder auf Leistung oder Vorhersagbarkeit. Existierende vorhersagbare on-Chip Netzwerke werden als zu teuer und unflexibel erachtet, um eine Vielzahl von Anwendungen mit gegensätzlichen Anforderungen zu integrieren. Und state-of-the-art Analysen vernachlässigen bzw. vereinfachen bestimmte Plattformeigenschaften, um das Verhalten überprüfen zu können. Dies führt zu einer hohen Überbereitstellung der Hardware-Ressourcen als auch zu negativen Auswirkungen auf die Systemleistung und auf die Flexibilität des Systems.
In dieser Arbeit gehen wir auf diese Herausforderungen ein und entwickeln eine vorhersehbare und zur Laufzeit anpassbare Architektur für on-Chip Netzwerke, welche gemischt-kritische Anwendungen effizient integriert. Zusätzlich stellen wir ein Modellierungs- und Analyseframework für on-Chip Netzwerke vor, das den Paketrückstau berücksichtigt. Dieses Framework ermöglicht es, Designentscheidungen anhand abstrakter Modelle und formaler Ansätze frühzeitig beurteilen
Analysis of network-on-chip topologies for cost-efficient chip multiprocessors
Abstract not availableMarta OrtĂn-ObĂłn, DarĂo Suárez-Gracia, MarĂa Villarroya-GaudĂł, Cruz Izu, VĂctor Viñals-YĂşfer
On Energy Efficient Computing Platforms
In accordance with the Moore's law, the increasing number of on-chip integrated transistors has enabled modern computing platforms with not only higher processing power but also more affordable prices. As a result, these platforms, including portable devices, work stations and data centres, are becoming an inevitable part of the human society. However, with the demand for portability and raising cost of power, energy efficiency has emerged to be a major concern for modern computing platforms.
As the complexity of on-chip systems increases, Network-on-Chip (NoC) has been proved as an efficient communication architecture which can further improve system performances and scalability while reducing the design cost. Therefore, in this thesis, we study and propose energy optimization approaches based on NoC architecture, with special focuses on the following aspects.
As the architectural trend of future computing platforms, 3D systems have many bene ts including higher integration density, smaller footprint, heterogeneous integration, etc. Moreover, 3D technology can signi cantly improve the network communication and effectively avoid long wirings, and therefore, provide higher system performance and energy efficiency.
With the dynamic nature of on-chip communication in large scale NoC based systems, run-time system optimization is of crucial importance in order to achieve higher system reliability and essentially energy efficiency. In this thesis, we propose an agent based system design approach where agents are on-chip components which monitor and control system parameters such as supply voltage, operating frequency, etc. With this approach, we have analysed the implementation alternatives for dynamic voltage and frequency scaling and power gating techniques at different granularity, which reduce both dynamic and leakage energy consumption.
Topologies, being one of the key factors for NoCs, are also explored for energy saving purpose. A Honeycomb NoC architecture is proposed in this thesis with turn-model based deadlock-free routing algorithms. Our analysis and simulation based evaluation show that Honeycomb NoCs outperform their Mesh based counterparts in terms of network cost, system performance as well as energy efficiency.Siirretty Doriast
Physical parameter-aware Networks-on-Chip design
PhD ThesisNetworks-on-Chip (NoCs) have been proposed as a scalable, reliable
and power-efficient communication fabric for chip multiprocessors
(CMPs) and multiprocessor systems-on-chip (MPSoCs). NoCs determine
both the performance and the reliability of such systems, with a
significant power demand that is expected to increase due to developments
in both technology and architecture. In terms of architecture, an
important trend in many-core systems architecture is to increase the
number of cores on a chip while reducing their individual complexity.
This trend increases communication power relative to computation
power. Moreover, technology-wise, power-hungry wires are dominating
logic as power consumers as technology scales down. For these
reasons, the design of future very large scale integration (VLSI) systems
is moving from being computation-centric to communication-centric.
On the other hand, chip’s physical parameters integrity, especially
power and thermal integrity, is crucial for reliable VLSI systems. However,
guaranteeing this integrity is becoming increasingly difficult with
the higher scale of integration due to increased power density and operating
frequencies that result in continuously increasing temperature
and voltage drops in the chip. This is a challenge that may prevent
further shrinking of devices. Thus, tackling the challenge of power
and thermal integrity of future many-core systems at only one level
of abstraction, the chip and package design for example, is no longer
sufficient to ensure the integrity of physical parameters. New designtime
and run-time strategies may need to work together at different
levels of abstraction, such as package, application, network, to provide
the required physical parameter integrity for these large systems. This
necessitates strategies that work at the level of the on-chip network
with its rising power budget.
This thesis proposes models, techniques and architectures to improve
power and thermal integrity of Network-on-Chip (NoC)-based
many-core systems. The thesis is composed of two major parts: i)
minimization and modelling of power supply variations to improve
power integrity; and ii) dynamic thermal adaptation to improve thermal
integrity. This thesis makes four major contributions. The first is
a computational model of on-chip power supply variations in NoCs.
The proposed model embeds a power delivery model, an NoC activity
simulator and a power model. The model is verified with SPICE simulation
and employed to analyse power supply variations in synthetic
and real NoC workloads. Novel observations regarding power supply
noise correlation with different traffic patterns and routing algorithms
are found. The second is a new application mapping strategy aiming
vii
to minimize power supply noise in NoCs. This is achieved by defining
a new metric, switching activity density, and employing a force-based
objective function that results in minimizing switching density. Significant
reductions in power supply noise (PSN) are achieved with a low
energy penalty. This reduction in PSN also results in a better link timing
accuracy. The third contribution is a new dynamic thermal-adaptive
routing strategy to effectively diffuse heat from the NoC-based threedimensional
(3D) CMPs, using a dynamic programming (DP)-based distributed
control architecture. Moreover, a new approach for efficient extension
of two-dimensional (2D) partially-adaptive routing algorithms
to 3D is presented. This approach improves three-dimensional networkon-
chip (3D NoC) routing adaptivity while ensuring deadlock-freeness.
Finally, the proposed thermal-adaptive routing is implemented in
field-programmable gate array (FPGA), and implementation challenges,
for both thermal sensing and the dynamic control architecture are addressed.
The proposed routing implementation is evaluated in terms
of both functionality and performance.
The methodologies and architectures proposed in this thesis open a
new direction for improving the power and thermal integrity of future
NoC-based 2D and 3D many-core architectures
Erreichen von Performance in Netzwerken-On-Chip fĂĽr Echtzeitsysteme
In many new applications, such as in automatic driving, high performance requirements have reached safety critical real-time systems. Consequently, Networks-on-Chip (NoCs) must efficiently host new sets of highly dynamic workloads e.g., high resolution sensor fusion and data processing, autonomous decision’s making combined with machine learning.
The static platform management, as used in current safety critical systems, is no more sufficient to provide the needed level of service. A dynamic platform management could meet the challenge, but it usually suffers from a lack of predictability and the simplicity necessary for certification of safety and real-time properties. In this work, we propose a novel, global and dynamic arbitration for NoCs
with real-time QoS requirements. The mechanism decouples the admission control from arbitration in routers thereby simplifying a dynamic adaptation and real-time analysis. Consequently, the proposed solution allows the deployment of a sophisticated contract-based QoS provisioning without introducing complicated and hard to maintain schemes, known from the frequently applied static arbiters.
The presented work introduces an overlay network to synchronize transmissions using arbitration units called Resource Managers (RMs), which allows global and work-conserving scheduling. The description of resource allocation strategies is supplemented by protocol design and verification methodology bringing adaptive control to NoC communication in setups with different QoS requirements and traffic classes. For doing that, a formal worst-case timing analysis for the mechanism has been proposed which demonstrates that this solution not only exposes higher performance in simulation but, even more importantly, consistently reaches smaller formally guaranteed worst-case latencies than other strategies for realistic levels of system's utilization.
The approach is not limited to a specific network architecture or topology as the mechanism does not require modifications of routers and therefore can be used together with the majority of existing manycore systems. Indeed, the evaluation followed using the generic performance optimized router designs, as well as two systems-on-chip focused on real-time deployments. The results confirmed that the proposed approach proves to exhibit significantly higher average performance in simulation and execution.In vielen neuen sicherheitskritische Anwendungen, wie z.B. dem automatisierten
Fahren, werden groĂźe Anforderungen an die Leistung von Echtzeitsysteme gestellt.
Daher mĂĽssen Networks-on-Chip (NoCs) neue, hochdynamische Workloads
wie z.B. hochauflösende Sensorfusion und Datenverarbeitung oder autonome Entscheidungsfindung
kombiniert mit maschineller Lernen, effizient auf einem System unterbringen.
Die Steuerung der zugrunde liegenden NoC-Architektur, muss die Systemsicherheit vor Fehlern,
resultierend aus dem dynamischen Verhalten des Systems schĂĽtzen und
gleichzeitig die geforderte Performance bereitstellen.
In dieser Arbeit schlagen wir eine neuartige, globale und dynamische Steuerung
fĂĽr NoCs mit Echtzeit QoS Anforderungen vor. Das Schema entkoppelt die Zutrittskontrolle
von der Arbitrierung in Routern. Hierdurch wird eine dynamische Anpassung
ermöglicht und die Echtzeitanalyse vereinfacht. Der Einsatz einer ausgefeilten
vertragsbasierten Ressourcen-Zuweisung wird so ermöglicht, ohne komplexe und schwer wartbare Mechanismen, welche bereits aus dem statischen Plattformmanagement bekannt sind einzuführen.
Diese Arbeit stellt ein ĂĽbergelagertes Netzwerk vor, welches Ăśbertragungen mit
Hilfe von Arbitrierungseinheiten, den so genannten Resource Managern (RMs),
synchronisiert. Dieses überlagerte Netzwerk ermöglicht eine globale und lasterhaltende
Steuerung. Die Beschreibung verschiedener Ressourcenzuweisungstrategien
wird ergänzt durch ein Protokolldesign und Methoden zur Verifikation der
adaptiven NoC Steuerung mit unterschiedlichen QoS Anforderungen und Verkehrsklassen.
Hierfür wird eine formale Worst Case Timing Analyse präsentiert,
welche das vorgestellte Verfahren abbildet. Die Resultate bestätitgen, dass die präsentierte
Lösung nicht nur eine höhere Performance in der Simulation bietet, sondern
auch formal kleinere Worst-Case Latenzen fĂĽr realistische Systemauslastungen
als andere Strategien garantiert.
Der vorgestellte Ansatz ist nicht auf eine bestimmte Netzwerkarchitektur oder
Topologie beschränkt, da der Mechanismus keine Änderungen an den unterliegenden
Routern erfordert und kann daher zusammen mit bestehenden Manycore-Systemen
eingesetzt werden. Die Evaluierung erfolgte auf Basis eines leistungsoptimierten
Router-Designs sowie zwei auf Echtzeit-Anwendungen fokusierten Platformen.
Die Ergebnisse bestätigten, dass der vorgeschlagene Ansatz im Durchschnitt
eine deutlich höhere Leistung in der Simulation und Ausführung liefert
Software-based and regionally-oriented traffic management in Networks-on-Chip
Since the introduction of chip-multiprocessor systems, the number of integrated cores has been steady growing and workload applications have been adapted to exploit the increasing parallelism. This changed the importance of efficient on-chip communication significantly and the infrastructure has to keep step with these new requirements.
The work at hand makes significant contributions to the state-of-the-art of the latest generation of such solutions, called Networks-on-Chip, to improve the performance, reliability, and flexible management of these on-chip infrastructures
Approches d'optimisation et de personnalisation des réseaux sur puce (NoC : Networks on Chip)
Systems-on-chip (SoC) have become more and more complex due to the development of integrated circuit technology.Recent studies have shown that in order to improve the performance of a specific SoC application domain, the on-chipinter-connects (OCI) architecture must be customized at design-time or at run-time. Related approaches generallyprovide application-specific SoCs tailored to specific applications. The aim of this thesis is to carry out new approachesfor Network-on-Chip (NoC) and study their performances, especially in terms of latency, throughput, energyconsumption and simplicity of implementation.We have proposed an approach to allow designers to customize a candidate OCI architecture by adding strategiclinks in order to match large application workload. The analytical evaluation focuses on improving the physicalparameters of the NoC topology regardless of the application that should run on. The evaluation by simulationfocuses to evaluate the communication performances of the NoC. Simulations results show the effectiveness ofthis approach to improve the NoC performances. We have also introduced a compartmental Fluid-flow basedmodeling approach to allocate required resource for each buffer based on the application traffic pattern. Simulationsare conducted and results show the efficiency of this modeling method for a buffer space optimized allocation.Finally, we proposed a joint approach based on a system dynamics theory for evaluating the performance of a flowcontrol algorithm in NoCs. This algorithm allows NoC elements to dynamically adjust their inflow by using afeedback control-based mechanism. Analytical and simulation results showed the viability of this mechanism forcongestion avoidance in NoCs.Les systèmes embarqués sur puce (SoC : Systems-on-Chip) sont devenus de plus en plus complexes grâce à l’évolution de la technologie des circuits intégrés. Des études récentes ont montré que pour améliorer les performances du réseau su puce (NoC : Network-on-Chip), l’architecture de celui-ci pouvait être personnalisée, soit au moment de la conception, soit au moment de l’exécution. L’objectif principal de cette thèse est d’implémenter de nouvelles approches pour améliorer les performances des NoCs, notamment la latence, le débit, la consommation d’énergie, et la simplicité de mise en œuvre.Nous avons proposé une approche pour permettre aux concepteurs de personnaliser l'architecture d’un NoC par insertion de liens stratégiques, pour qu’elle soit adaptée à de nombreuses applications, sous la contrainte d’un budget limité en termes de nombre de liens. L’évaluation analytique porte sur l’amélioration des paramètres physiques de la topologie du NoC sans tenir compte de l’application qui devrait s’exécuter dessus. L’évaluation par simulation porte sur l’évaluation des performances de communication du NoC. Les résultats de simulations montrent l’efficacité de notre approche pour améliorer les performances du NoC. Nous avons également introduit une approche de modélisation par réseau à compartiments pour allouer les ressources nécessaires pour chaque tampon selon le modèle de trafic de l'application cible. Les résultats de simulations montrent l'efficacité de cette approche de modélisation pour l’allocation optimisée de l'espace tampon. Enfin, nous avons proposé une approche conjointe basée sur la théorie des systèmes dynamiques pour évaluer la performance d'un algorithme de contrôle de flux dans les NoCs. Cet algorithme permet aux éléments du NoC d’ajuster dynamiquement leur entrée en utilisant un mécanisme basé sur le contrôle de flux par rétroaction. Les résultats d’évaluations analytiques et de simulation montrent la viabilité de ce mécanisme pour éviter la congestion dans les NoCs
A DRAM Centric NoC Architecture and Topology Design Approach
Most communication traffic in today\u2019s System on Chips (SoC) is DRAM centric. The NoC should be designed to efficiently handle the many-to-one communication pattern, funneling to and from the DRAM controller. In this paper, we motivate the use of a separate network for the DRAM traffic and justify the power overhead and performance improvement obtained, when compared to traditional solutions. We also show how the topology of this DRAM network can be designed and optimized to account for the funnel-shaped pattern. Our experiments on a realistic SoC multimedia benchmark shows a large reduction in power consumption and improvement in performance when compared to existing solutions