28 research outputs found

    NoC Resource Allocation Based on Physical Design Techniques

    Get PDF
    Networks-on-Chip (NoC) has been recognized as a scalable approach for on-chip communication. Quality-of-Service (QoS) is a fundamental part of application specific NoCs. This thesis focuses on resource allocation on NoC, to improve the capability of NoC for Guaranteed Service (GS). A graph model is adopted to describe physical and temporal sources of a NoC. Based on the graph model, an RRR-based algorithm is proposed for simultaneous routing and time slot allocation. In addition, a negotiation-based algorithm is suggested for achieving power-efficient QoS for application-specific NoCs. Last, a hybrid NoC architecture, which combines circuit switching and packet switching, is developed and investigated. Experimental results show that our techniques outperform previous works

    Advanced Connection Allocation Techniques in Circuit Switching Network on Chip

    Get PDF
    With the advancement of semiconductor technology, the System on Chip (SoC) is becoming more and more complex, so the on-chip communication has become a bottleneck of SoC Design. Since the traditional bus system is inefficient and not scalable, the Network-On-Chip (NoC) has emerged as the promising communication mechanism for complex SoCs. As some systems have specific performance requirements, such as a minimum throughput (for real-time streaming data) or bounded latency (for interrupts, process synchronization, etc), communication with Guaranteed Service (GS) support becomes crucial for predictable SoC architectures. Circuit Switching (CS) is a popular approach to support GS, which firstly has to allocate an exclusively connection (circuit) between the source and destination nodes, and then the data packets are delivered over this connection. However, it is inefficient and inflexible because the resource is occupied by single connection during its whole lifetime, which can block other communications. Hence, two extensions of CS have been proposed to share resources: i) Time-Division Multiplexing (TDM), in which the available link capacity is split into multiple time slots to be shared by different flows in TDM scheme; and ii) Space-Division-Multiplexing (SDM), in which only a subset (sub-channel) of the link wires is exclusively allocated to a specific connection, while the remaining wires of the link can be used by other flows. The connection allocation is critical for CS, since the data delivery can start only after the associated connection is allocated. In this thesis, we propose a dedicated hardware connection allocator to solve the dynamic connection allocation problem for CS NoCs, which has to i) allocate a contention-free path between source-destination pairs and ii) allocate appropriate portions of link bandwidth (appropriate number of time slots and subsets) along the path. The dedicated connection allocator, called NoCManager, solves the connection allocation problem by employing a trellis-search based shortest path algorithm. The trellis search can explore all possible paths between source node and destination. Moreover, it shall find the requested path in a fixed low latency and can guarantee the path optimality in terms of path length if the path is available. In this thesis, two different trellis graphs, Forward-Backtrack trellis and Register-Exchange trellis are proposed. The Forward-Backtrack trellis completes the path search in two steps: forward search and backtracking. Firstly, the forward search begins at source node that traverses the network to find the free path. When destination node is reached, the backtrack starts from destination to select the survivor path and collect the associated path parameters. However, Register-Exchange trellis saves the entire survivor path sequences during forward search. Consequently, the backtracking step can be omitted, and thus the allocation time is halved compared to forward-backtrack approaches. Moreover, each trellis graph consists of three categories, unfolded structure, folded structure and bidirectional structure. The unfolded structure can provide high allocation speed while folded structure is more efficient from a hardware point of view. The bidirectional structure starts the search at two sides, source node and destination node simultaneously, so the allocation speed is 2 times faster than previous unidirectional search. Furthermore, in order to address the scalability issue of previous centralized systems, the partitioned architecture (i.e. spatial partitioning technique) is proposed to divide the large system into multiple smaller differentiated logical partitions served by local NoCManagers. This partitioning technique keeps the request load of the manager and manager-node communication overhead moderate. Inside each partition, the path search problem is solved by a local manager with trellis-search algorithm. To establish a path that crosses partitions, the managers communicate with each other in distributed manner to converge the global path. In order to further enhance the path diversity and resource utilization, we adopt the combined TDM and SDM technique. In combined TDM-SDM approach, each SDM sub-channel is split into multiple time slots so that can be shared by multiple flows. Hence, the number of sub-channels can be kept moderate to reduce router complexity, while still providing higher path diversity than TDM scheme. In order to investigate and optimize TDM-SDM partitioning strategy, we studied the influence of different TDM-SDM link partitioning strategies on success rate and path length that allowed us to find the optimal solution. The dedicated connection allocator using the trellis-search algorithm is employed for TDM, SDM and TDM-SDM CS. In the end, we present the router architecture that combines the circuit-switching network (for GS communication) and packet-switching network (for best-effort communication)

    Improving Packet Predictability of Scalable Network-on-Chip Designs without Priority Pre-emptive Arbitration

    Get PDF
    The quest for improving processing power and efficiency is spawning research into many-core systems with hundreds or thousands of cores. With communication being forecast as the foremost performance bottleneck, Network-on-Chips are the favoured communication infrastructure in the context mainly due to reasons like scalability and power efficiency. However, contention between non-preemptive NoC packets can result in variation in packet latencies thus potentially limiting the overall utilisation of the many-core system. Typical latency predictability enhancement techniques like Virtual Channels or Time Division Multiplexing are usually hardware expensive or non-scalable or both. This research explores the use of dynamic and scalable techniques in Network-on-Chip routers to improve packet predictability by countering Head-of-line blocking (blocked low priority packet blocking a high priority packet) and tailbacking (low priority packet utilising the link that is required by a high priority packet) of non-preemptive packets. The Priority forwarding and tunnelling technique introduced is designed to detect Head-of-line blocking situations so that its internal arbitration parameters can be altered (by forwarding packet parameters down the line) to resolve such issues. The Selective packet splitting technique presented allows resolution of tailbacking by emulating the effect of preemption of packets (by splitting packets) by using a low overhead alternative that manipulates packets. Finally, the thesis presents an architecture that allows the routers to have a notion of timeliness in data packets thus enabling packet arbitration based on application-supplied priority and timeliness thus improving the quality of service given to lower priority packets. Furthermore, the techniques presented in the thesis do not require additional hardware with the increase in size of the NoC. This enables the techniques to be scalable, as the size of the NoC or the number of packet priorities the NoC has to handle does not affect the functionality and operation of the techniques

    Erreichen von Performance in Netzwerken-On-Chip für Echtzeitsysteme

    Get PDF
    In many new applications, such as in automatic driving, high performance requirements have reached safety critical real-time systems. Consequently, Networks-on-Chip (NoCs) must efficiently host new sets of highly dynamic workloads e.g., high resolution sensor fusion and data processing, autonomous decision’s making combined with machine learning. The static platform management, as used in current safety critical systems, is no more sufficient to provide the needed level of service. A dynamic platform management could meet the challenge, but it usually suffers from a lack of predictability and the simplicity necessary for certification of safety and real-time properties. In this work, we propose a novel, global and dynamic arbitration for NoCs with real-time QoS requirements. The mechanism decouples the admission control from arbitration in routers thereby simplifying a dynamic adaptation and real-time analysis. Consequently, the proposed solution allows the deployment of a sophisticated contract-based QoS provisioning without introducing complicated and hard to maintain schemes, known from the frequently applied static arbiters. The presented work introduces an overlay network to synchronize transmissions using arbitration units called Resource Managers (RMs), which allows global and work-conserving scheduling. The description of resource allocation strategies is supplemented by protocol design and verification methodology bringing adaptive control to NoC communication in setups with different QoS requirements and traffic classes. For doing that, a formal worst-case timing analysis for the mechanism has been proposed which demonstrates that this solution not only exposes higher performance in simulation but, even more importantly, consistently reaches smaller formally guaranteed worst-case latencies than other strategies for realistic levels of system's utilization. The approach is not limited to a specific network architecture or topology as the mechanism does not require modifications of routers and therefore can be used together with the majority of existing manycore systems. Indeed, the evaluation followed using the generic performance optimized router designs, as well as two systems-on-chip focused on real-time deployments. The results confirmed that the proposed approach proves to exhibit significantly higher average performance in simulation and execution.In vielen neuen sicherheitskritische Anwendungen, wie z.B. dem automatisierten Fahren, werden große Anforderungen an die Leistung von Echtzeitsysteme gestellt. Daher müssen Networks-on-Chip (NoCs) neue, hochdynamische Workloads wie z.B. hochauflösende Sensorfusion und Datenverarbeitung oder autonome Entscheidungsfindung kombiniert mit maschineller Lernen, effizient auf einem System unterbringen. Die Steuerung der zugrunde liegenden NoC-Architektur, muss die Systemsicherheit vor Fehlern, resultierend aus dem dynamischen Verhalten des Systems schützen und gleichzeitig die geforderte Performance bereitstellen. In dieser Arbeit schlagen wir eine neuartige, globale und dynamische Steuerung für NoCs mit Echtzeit QoS Anforderungen vor. Das Schema entkoppelt die Zutrittskontrolle von der Arbitrierung in Routern. Hierdurch wird eine dynamische Anpassung ermöglicht und die Echtzeitanalyse vereinfacht. Der Einsatz einer ausgefeilten vertragsbasierten Ressourcen-Zuweisung wird so ermöglicht, ohne komplexe und schwer wartbare Mechanismen, welche bereits aus dem statischen Plattformmanagement bekannt sind einzuführen. Diese Arbeit stellt ein übergelagertes Netzwerk vor, welches Übertragungen mit Hilfe von Arbitrierungseinheiten, den so genannten Resource Managern (RMs), synchronisiert. Dieses überlagerte Netzwerk ermöglicht eine globale und lasterhaltende Steuerung. Die Beschreibung verschiedener Ressourcenzuweisungstrategien wird ergänzt durch ein Protokolldesign und Methoden zur Verifikation der adaptiven NoC Steuerung mit unterschiedlichen QoS Anforderungen und Verkehrsklassen. Hierfür wird eine formale Worst Case Timing Analyse präsentiert, welche das vorgestellte Verfahren abbildet. Die Resultate bestätitgen, dass die präsentierte Lösung nicht nur eine höhere Performance in der Simulation bietet, sondern auch formal kleinere Worst-Case Latenzen für realistische Systemauslastungen als andere Strategien garantiert. Der vorgestellte Ansatz ist nicht auf eine bestimmte Netzwerkarchitektur oder Topologie beschränkt, da der Mechanismus keine Änderungen an den unterliegenden Routern erfordert und kann daher zusammen mit bestehenden Manycore-Systemen eingesetzt werden. Die Evaluierung erfolgte auf Basis eines leistungsoptimierten Router-Designs sowie zwei auf Echtzeit-Anwendungen fokusierten Platformen. Die Ergebnisse bestätigten, dass der vorgeschlagene Ansatz im Durchschnitt eine deutlich höhere Leistung in der Simulation und Ausführung liefert

    A Scalable and Adaptive Network on Chip for Many-Core Architectures

    Get PDF
    In this work, a scalable network on chip (NoC) for future many-core architectures is proposed and investigated. It supports different QoS mechanisms to ensure predictable communication. Self-optimization is introduced to adapt the energy footprint and the performance of the network to the communication requirements. A fault tolerance concept allows to deal with permanent errors. Moreover, a template-based automated evaluation and design methodology and a synthesis flow for NoCs is introduced

    Fehlertolerante Mehrkernprozessoren für gemischt-kritische Echtzeitsysteme

    Get PDF
    Current and future computing systems must be appropriately designed to cope with random hardware faults in order to provide a dependable service and correct functionality. Dependability has many facets to be addressed when designing a system and that is specially challenging in mixed-critical real-time systems, where safety standards play an important role and where responding in time can be as important as responding correctly or even responding at all. The thesis addresses the dependability of mixed-critical real-time systems, considering three important requirements: integrity, resilience and real-time. More specifically, it looks into the architectural and performance aspects of achieving dependability, concentrating its scope on error detection and handling in hardware -- more specifically in the Network-on-Chip (NoC), the backbone of modern MPSoC -- and on the performance of error handling and recovery in software. The thesis starts by looking at the impacts of random hardware faults on the NoC and on the system, with special focus on soft errors. Then, it addresses the uncovered weaknesses in the NoC by proposing a resilient NoC for mixed-critical real-time systems that is able to provide a highly reliable service with transparent protection for the applications. Formal communication time analysis is provided with common ARQ protocols modeled for NoCs and including a novel ARQ-based protocol optimized for DMAs. After addressing the efficient use of ARQ-based protocols in NoCs, the thesis proposes the Advanced Integrity Q-service (AIQ), a low-overhead mechanism to achieve integrity and real-time guarantees of NoC transactions on an End-to-End (E2E) basis. Inspired by transactions in distributed systems, the mechanism differs from the previous approach in that it does not provide error recovery in hardware but delegates the task to software, making use of existing functionality in cross-layer fault-tolerance solutions. Finally, the thesis addresses error handling in software as seen in cross-layer approaches. It addresses the performance of replicated software execution in many-core platforms. Replicated software execution provides protection to the system against random hardware faults. It relies on hardware-supported error detection and error handling in software. The replica-aware co-scheduling is proposed to achieve high performance with replicated execution, which is not possible with standard real-time schedulers.Um einen zuverlässigen Betrieb und korrekte Funktionalität zu gewährleisten, müssen aktuelle und zukünftige Computersysteme so ausgelegt werden, dass sie mit diesen Fehlern umgehen können. Zuverlässigkeit hat viele Aspekte, die bei der Entwicklung eines Systems berücksichtigt werden müssen. Das gilt insbesondere für Echtzeitsysteme mit gemischter Kritikalität, bei denen Sicherheitsstandards, die ein korrektes und rechtzeitiges Verhalten fordern, eine wichtige Rolle spielen. Diese Dissertation befasst sich mit der Zuverlässigkeit von gemischt-kritischen Echtzeitsystemen unter Berücksichtigung von drei wichtigen Anforderungen: Integrität, Resilienz und Echtzeit. Genauer gesagt, behandelt sie Architektur- und Leistungsaspekte die notwendig sind um Zuverlässigkeit zu erreichen, wobei der Schwerpunkt auf der Fehlererkennung und -behandlung in der Hardware – genauer gesagt im Network-on-Chip (NoC), dem Rückgrat des modernen MPSoC – und auf der Leistung der Fehlerbehandlung und -behebung in der Software liegt. Die Arbeit beginnt mit der Untersuchung der Auswirkung von zufälligen Hardwarefehlern auf das NoC und das System, wobei der Schwerpunkt auf weichen Fehler (soft errors) liegt. Anschließend werden die aufgedeckten Schwachstellen im NoC behoben, indem ein widerstandsfähiges NoC für gemischt-kritische Echtzeitsysteme vorgeschlagen wird, das in der Lage ist, einen höchst zuverlässigen Betrieb mit transparentem Schutz für die Anwendungen zu bieten. Nach der Auseinandersetzung mit der effizienten Nutzung von ARQ-basierten Protokolle in NoCs, wird der Advanced Integrity Q-Service (AIQ) vorgestellt, der ein Mechanismus mit geringem Overhead ist, um Integrität und Echtzeit-Garantien von NoC-Transaktionen auf Ende-zu-Ende (E2E)-Basis zu erreichen. Inspiriert von Transaktionen in verteilten Systemen unterscheidet sich der Mechanismus vom bisherigen Konzept dadurch, dass er keine Fehlerbehebung in der Hardware vorsieht, sondern diese Aufgabe an die Software delegiert. Schließlich befasst sich die Dissertation mit der Fehlerbehandlung in Software, wie sie in schichtübergreifenden Methoden zu sehen ist. Sie behandelt die Leistung der replizierten Software-Ausführung in Many-Core-Plattformen. Es setzt auf hardwaregestützte Fehlererkennung und Fehlerbehandlung in der Software. Das Replika-bewusste Co-Scheduling wird vorgeschlagen, um eine hohe Performance bei replizierter Ausführung zu erreichen, was mit Standard-Echtzeit-Schedulern nicht möglich ist

    Resource Management Algorithms for Computing Hardware Design and Operations: From Circuits to Systems

    Get PDF
    The complexity of computation hardware has increased at an unprecedented rate for the last few decades. On the computer chip level, we have entered the era of multi/many-core processors made of billions of transistors. With transistor budget of this scale, many functions are integrated into a single chip. As such, chips today consist of many heterogeneous cores with intensive interaction among these cores. On the circuit level, with the end of Dennard scaling, continuously shrinking process technology has imposed a grand challenge on power density. The variation of circuit further exacerbated the problem by consuming a substantial time margin. On the system level, the rise of Warehouse Scale Computers and Data Centers have put resource management into new perspective. The ability of dynamically provision computation resource in these gigantic systems is crucial to their performance. In this thesis, three different resource management algorithms are discussed. The first algorithm assigns adaptivity resource to circuit blocks with a constraint on the overhead. The adaptivity improves resilience of the circuit to variation in a cost-effective way. The second algorithm manages the link bandwidth resource in application specific Networks-on-Chip. Quality-of-Service is guaranteed for time-critical traffic in the algorithm with an emphasis on power. The third algorithm manages the computation resource of the data center with precaution on the ill states of the system. Q-learning is employed to meet the dynamic nature of the system and Linear Temporal Logic is leveraged as a tool to describe temporal constraints. All three algorithms are evaluated by various experiments. The experimental results are compared to several previous work and show the advantage of our methods

    Rede intra-chip com previsibilidade de latência para uso em sistemas de tempo real

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2015.Sistemas intra-chip ou SoC (acrônimo de Systems-on-Chip) com múltiplas unidades de processamento heterogêneas têm sido usados pela indústria de silício como solução para disponibilizar o desempenho demandado pelas modernas aplicações multimídia. No entanto, a integração de um crescente número de unidades de processamento especializadas em um mesmo SoC impõem um desafio para os mecanismos de interconexão de tais sistemas, que agora são obrigados a lidar com um grande número de fluxos de comunicação muito distintos, com requisitos de latência e largura de banda também muito distintos. Como solução, a indústria do silício vem utilizando redes intra-chip ou NoCs (acrônimo de Networks-on-Chip) com previsibilidade de latência para interligar tais unidades de processamento neste tipo de SoC. No entanto, muitas aplicações neste domínio obteriam mais benefícios de uma NoC que pudesse otimizar a utilização dos recursos para fluxos multimídia que toleram variações razoáveis na Qualidade de Serviço (em inglês Quality of Service - QoS). Será demonstrado ao longo deste documento que muitos destes sistemas são concebidos em torno de alguns fluxos de comunicações de tempo real muito restritos, que precisam ser tratados dentro de limites de tempo rigorosos (muitas vezes envolvendo comandos para o controle do sistema ou tarefas de sinalização de estado do sistema) e um grande número de fluxos multimídia menos restritos, que toleram variações muito maiores na latência e na largura de banda. A estratégia de projeto de NoCs predominante na literatura para produzir interconexões para SoCs de tempo real baseia-se no mapeamento dos requisitos de comunicação de tarefas em tempo real (por vezes implementadas em hardware como componentes de propriedade intelectual dedicados) para os recursos de rede disponíveis em fases iniciais do projeto. Este mapeamento, no entanto, muitas vezes é realizado considerando um cenário de pior caso e, portanto, resulta em reserva de recursos que poderiam ser dinamicamente realocados para outros fluxos. Embora adequado para aplicações críticas de tempo real, esta estratégia resulta na má utilização de silício para aplicações multimídia com taxa de bits variável. Neste contexto, esta Tese apresenta uma rede que oferece previsibilidade na latência de pior caso, denominada de RTSNoC, e que foi projetada para o cenário no qual o sistema possui poucos fluxos de comunicação com restrições de tempo real rígidas, relacionados ao controle do sistema, e muitos fluxos de comunicação multimídia com restrições de tempo real menos rígidas. Na verdade, uma latência de pior caso para tais fluxos multimídia pode ser determinado em tempo de projeto, de modo que os projetistas poderiam de fato modelar os fluxos de multimídia como sendo de tempo real suave (ou soft real-time), cuja degradação é proporcional à quantidade de fluxos flui ao longo da rede. No entanto, uma vez que a estratégia de roteamento adotada na RTSNoC não usa qualquer tipo de reserva de recursos em tempo de execução, neste documento tais fluxos serão designados como sendo ?fluxos de melhor esforço? (em inglês Best Effort- BE). A arquitetura da rede proposta baseia-se na intercalação de flits provenientes de diferentes fluxos em um mesmo canal de comunicação entre roteadores da rede, de modo que cada flit contém informações de roteamento. Os resultados experimentais demonstram que a latência média de fluxos com variação na taxa de bits injetados na rede proposta é, em média, mais baixa do que em redes que executam a reserva de recursos e estão operando com 80% de tráfego oferecido. Além disso, é demonstrado analiticamente que fluxos de comunicação de tempo real projetados considerando o valor da latência de pior caso da rede sempre atenderão as restrições associadas a tarefas de tempo real rígidas, de modo que não há perda no limite de tempo para a execução de tais tarefas devido à contenção de recursos na rede.Abstract : Systems-on-Chip (SoC) with multiple heterogeneous processing unitshave been used by the silicon industry as means to deliver the performancerequired by modern multimedia applications. However, theintegration of an increasing number of specialized processing units posesa challenge on the interconnection mechanisms in such systems,which are now required to handle a large number of very distinctivecommunication ows, with very distinct latency and bandwidth requirements.As a solution, the silicon industry has been using predictableNetworks-on-Chip (NoC) to interconnect components in this kind ofSoC. Nevertheless, many applications in this domain would prot betterfrom a NoC that could optimize the utilization of resources formultimedia ows that tolerate reasonable variations in the Qualityof-Service(QoS). In this document will be shown that several systemshave been conceived around a few very strict real-time communicationows (often involving control or signalling tasks) and a large numberof less strict multimedia ows that tolerate much larger variations inlatency and bandwidth. In this context, current real-time NoC designsfall short at making good use of hardware resources as they rely onworst-case resource reservation. The prevailing design strategy to produceinterconnects for such SoCs relies on mapping the communicationrequirements of real-time tasks (sometimes implemented in hardwareas dedicated IPs) to available network resources at early design stages.This mapping, however, is often performed considering a worst-casescenario and therefore results in the reservation of resources that couldotherwise by dynamically reallocated to other ows. Although adequatefor critical real-time applications, this strategy results in poorsilicon utilization for variable-bit-rate multimedia applications. Thisdocument presents a Worst-Case Latency (WCL) of a network calledRTSNoC that was designed with the aforementioned scenario in mind:few hard real-time control ows and many best-eort multimedia ows.Indeed, a worst-case latency for such best-eort ows can be determinedat design-time, so designers could indeed model the multimediaows as soft real-time (or QoS) ows whose degradation is proportionalto the amount of streams owing across the chip. However, sincethe routing strategy does not use any kind of resource reservation atrun-time, this document will refers to those ows as best-eort. Theproposed NoC architecture is based on the interleaving of its fromdierent ows in the same communication channel between routers, soeach its carries along routing information. Experimental result showedthat the worst-case latency in RTSNoC network was, in average, lowerthan NoC that adopt resources reservation, when those networks areworking over 80% of oered load. Furthermore, it was analytically demonstratedthat the communication ows related to real-time designedconsidering the worst-case latency of the network always will achievethe restrictions related to hard real-time tasks. It means that there isno deadline lost for the execution of those tasks due to the contentionof network resources
    corecore