11 research outputs found

    Packetizing OCP Transactions in the MANGO Network-on-Chip

    Get PDF

    An OCP Compliant Network Adapter for GALS-based SoC Design Using the MANGO Network-on-Chip

    Get PDF
    The demand for IP reuse and system level scalability in System-on-Chip (SoC) designs is growing. Network-onchip (NoC) constitutes a viable solution space to emerging SoC design challenges. In this paper we describe an OCP compliant network adapter (NA) architecture for the MANGO NoC. The NA decouples communication and computation, providing memory-mapped OCP transactions based on primitive message-passing services of the network. Also, it facilitates GALS-type systems, by adapting to the clockless network. This helps leverage a modular SoC design flow. We evaluate performance and cost of 0.13 µm CMOS standard cell instantiations of the architecture. I

    Asynchronous design of Networks-on-Chip

    Get PDF

    Quarc: an architecture for efficient on-chip communication

    Get PDF
    The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication

    The MANGO clockless network-on-chip: Concepts and implementation

    Get PDF

    Network Interface Design for Network-on-Chip

    Get PDF
    In the culture of globalized integrated circuit (IC, a.k.a chip) production, the use of Intellectual Property (IP) cores, computer aided design tools (CAD) and testing services from un-trusted vendors are prevalent to reduce the time to market. Unfortunately, the globalized business model potentially creates opportunities for hardware tampering and modification from adversary, and this tampering is known as hardware Trojan (HT). Network-on-chip (NoC) has emerged as an efficient on-chip communication infrastructure. In this work, the security aspects of NoC network interface (NI), one of the most critical components in NoC will be investigated and presented. Particularly, the NI design, hardware attack models and countermeasures for NI in a NoC system are explored. An OCP compatible NI is implemented in an IBM0.18ìm CMOS technology. The synthesis results are presented and compared with existing literature. Second, comprehensive hardware attack models targeted for NI are presented from system level to circuit level. The impact of hardware Trojans on NoC functionality and performance are evaluated. Finally, a countermeasure method is proposed to address the hardware attacks in NIs

    Rede intra-chip com previsibilidade de latência para uso em sistemas de tempo real

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2015.Sistemas intra-chip ou SoC (acrônimo de Systems-on-Chip) com múltiplas unidades de processamento heterogêneas têm sido usados pela indústria de silício como solução para disponibilizar o desempenho demandado pelas modernas aplicações multimídia. No entanto, a integração de um crescente número de unidades de processamento especializadas em um mesmo SoC impõem um desafio para os mecanismos de interconexão de tais sistemas, que agora são obrigados a lidar com um grande número de fluxos de comunicação muito distintos, com requisitos de latência e largura de banda também muito distintos. Como solução, a indústria do silício vem utilizando redes intra-chip ou NoCs (acrônimo de Networks-on-Chip) com previsibilidade de latência para interligar tais unidades de processamento neste tipo de SoC. No entanto, muitas aplicações neste domínio obteriam mais benefícios de uma NoC que pudesse otimizar a utilização dos recursos para fluxos multimídia que toleram variações razoáveis na Qualidade de Serviço (em inglês Quality of Service - QoS). Será demonstrado ao longo deste documento que muitos destes sistemas são concebidos em torno de alguns fluxos de comunicações de tempo real muito restritos, que precisam ser tratados dentro de limites de tempo rigorosos (muitas vezes envolvendo comandos para o controle do sistema ou tarefas de sinalização de estado do sistema) e um grande número de fluxos multimídia menos restritos, que toleram variações muito maiores na latência e na largura de banda. A estratégia de projeto de NoCs predominante na literatura para produzir interconexões para SoCs de tempo real baseia-se no mapeamento dos requisitos de comunicação de tarefas em tempo real (por vezes implementadas em hardware como componentes de propriedade intelectual dedicados) para os recursos de rede disponíveis em fases iniciais do projeto. Este mapeamento, no entanto, muitas vezes é realizado considerando um cenário de pior caso e, portanto, resulta em reserva de recursos que poderiam ser dinamicamente realocados para outros fluxos. Embora adequado para aplicações críticas de tempo real, esta estratégia resulta na má utilização de silício para aplicações multimídia com taxa de bits variável. Neste contexto, esta Tese apresenta uma rede que oferece previsibilidade na latência de pior caso, denominada de RTSNoC, e que foi projetada para o cenário no qual o sistema possui poucos fluxos de comunicação com restrições de tempo real rígidas, relacionados ao controle do sistema, e muitos fluxos de comunicação multimídia com restrições de tempo real menos rígidas. Na verdade, uma latência de pior caso para tais fluxos multimídia pode ser determinado em tempo de projeto, de modo que os projetistas poderiam de fato modelar os fluxos de multimídia como sendo de tempo real suave (ou soft real-time), cuja degradação é proporcional à quantidade de fluxos flui ao longo da rede. No entanto, uma vez que a estratégia de roteamento adotada na RTSNoC não usa qualquer tipo de reserva de recursos em tempo de execução, neste documento tais fluxos serão designados como sendo ?fluxos de melhor esforço? (em inglês Best Effort- BE). A arquitetura da rede proposta baseia-se na intercalação de flits provenientes de diferentes fluxos em um mesmo canal de comunicação entre roteadores da rede, de modo que cada flit contém informações de roteamento. Os resultados experimentais demonstram que a latência média de fluxos com variação na taxa de bits injetados na rede proposta é, em média, mais baixa do que em redes que executam a reserva de recursos e estão operando com 80% de tráfego oferecido. Além disso, é demonstrado analiticamente que fluxos de comunicação de tempo real projetados considerando o valor da latência de pior caso da rede sempre atenderão as restrições associadas a tarefas de tempo real rígidas, de modo que não há perda no limite de tempo para a execução de tais tarefas devido à contenção de recursos na rede.Abstract : Systems-on-Chip (SoC) with multiple heterogeneous processing unitshave been used by the silicon industry as means to deliver the performancerequired by modern multimedia applications. However, theintegration of an increasing number of specialized processing units posesa challenge on the interconnection mechanisms in such systems,which are now required to handle a large number of very distinctivecommunication ows, with very distinct latency and bandwidth requirements.As a solution, the silicon industry has been using predictableNetworks-on-Chip (NoC) to interconnect components in this kind ofSoC. Nevertheless, many applications in this domain would prot betterfrom a NoC that could optimize the utilization of resources formultimedia ows that tolerate reasonable variations in the Qualityof-Service(QoS). In this document will be shown that several systemshave been conceived around a few very strict real-time communicationows (often involving control or signalling tasks) and a large numberof less strict multimedia ows that tolerate much larger variations inlatency and bandwidth. In this context, current real-time NoC designsfall short at making good use of hardware resources as they rely onworst-case resource reservation. The prevailing design strategy to produceinterconnects for such SoCs relies on mapping the communicationrequirements of real-time tasks (sometimes implemented in hardwareas dedicated IPs) to available network resources at early design stages.This mapping, however, is often performed considering a worst-casescenario and therefore results in the reservation of resources that couldotherwise by dynamically reallocated to other ows. Although adequatefor critical real-time applications, this strategy results in poorsilicon utilization for variable-bit-rate multimedia applications. Thisdocument presents a Worst-Case Latency (WCL) of a network calledRTSNoC that was designed with the aforementioned scenario in mind:few hard real-time control ows and many best-eort multimedia ows.Indeed, a worst-case latency for such best-eort ows can be determinedat design-time, so designers could indeed model the multimediaows as soft real-time (or QoS) ows whose degradation is proportionalto the amount of streams owing across the chip. However, sincethe routing strategy does not use any kind of resource reservation atrun-time, this document will refers to those ows as best-eort. Theproposed NoC architecture is based on the interleaving of its fromdierent ows in the same communication channel between routers, soeach its carries along routing information. Experimental result showedthat the worst-case latency in RTSNoC network was, in average, lowerthan NoC that adopt resources reservation, when those networks areworking over 80% of oered load. Furthermore, it was analytically demonstratedthat the communication ows related to real-time designedconsidering the worst-case latency of the network always will achievethe restrictions related to hard real-time tasks. It means that there isno deadline lost for the execution of those tasks due to the contentionof network resources

    NoC Prototyping on FPGAs: Component Design, Architecture Implementation and Comparison

    Get PDF
    Continuing improvements in integrated circuit technology over the past few decades enables increasingly large and complex Systems-on-Chip. Due to the large number of components used, the traditional bus-based interconnect scheme becomes cumbersome and restrictive. Hence, the Network-on-Chip interconnect paradigm becomes appealing due to its many advantages such as scalability and superior performance. Much research remains to be done exploring NoC architectures using real world benchmarks. In this thesis we describe the design space exploration of two major NoC components; a flexible adapter based on the Altera Avalon standard and a parameterizable wormhole router. Two well known NoC architectures, torus and ring, were synthesized for Altera FPGAs using these NoC components. The architectures were compared on the basis of packet latency, area and throughput, using a benchmark application. Simulation results show that the ring architecture gives superior area versus performance tradeoffs for the benchmark used

    Systematische Transaction-Level-Kommunikations-Modellierung mit SystemC

    Get PDF
    An emerging approach to embedded system design is to assemble them from a library of hardware and software component models (IP, intellectual property) using a system description language, such as SystemC. SystemC allows describing the communication among IPs in terms of abstract operations (transactions). The promise is that with transaction-level modeling (TLM), future systems-on-chip with one billion transistors and more can be composed out of IPs as simply as playing with LEGO bricks. However, reality is far out. In fact, each IP vendor promotes another proprietary interface standard and the provided design tools lack compatibility, such that heterogeneous IPs cannot be integrated efficiently. A novel generic interconnect fabric for TLM is presented which aims at enabling inter-operation between models of different levels of abstraction (mixed-mode) and models with different interfaces (heterogeneous components), with as little overhead as possible. A generic, protocol independent representation of transactions is developed, among with an abstraction level formalism. This approach is shown to support systematic simulation of state-of-the-art buses and networks-on-chip such as IBM CoreConnect and PCI Express over several levels of TLM abstraction. A layered simulation framework for SystemC, GreenBus, is developed to examine the proposed concepts. The thesis discusses new implementation techniques for communication modeling with SystemC which outperform the existing approaches in terms of flexibility, simulation accuracy, and performance. Based on these techniques, advanced concepts for TLM-based hardware/software co-design and FPGA prototyping are examined. Several experiments and a video processor case study highlight the efficiency of the approach and show its applicability in a TLM design flow.Eingebettete Systeme werden zunehmend auf Basis vorgefertigter Hard- und Softwarebausteine entwickelt, die in Form von Modellen (IP, Intellectual Property) vorliegen. Hierzu werden Systembeschreibungssprachen wie SystemC eingesetzt. SystemC ermöglicht, die Kommunikation zwischen IPs durch abstrakte Operationen, sog. Transaktionen zu beschreiben. Mit dieser Transaction-Level-Modellierung (TLM) sollen auch zukünftige Systeme mit 1 Milliarde Transistoren und mehr effizient entwickelt werden können. Idealerweise sollte das Hantieren mit IPs dabei so einfach sein wie das Spielen mit LEGO-Steinen. In der Realität sind jedoch IPs unterschiedlicher Hersteller nicht ohne weiteres integrierbar, und auch die Entwurfswerkzeuge sind nicht kompatibel. In dieser Doktorarbeit wird ein neuer, generischer Ansatz für die Transaction-Level-Modellierung mit SystemC vorgestellt, der Kommunikation zwischen Modellen auf unterschiedlichen Abstraktionsebenen (Mixed-Mode) und mit unterschiedlichen Schnittstellen (heterogene Komponenten) möglich macht. Der zusätzlich benötigte Simulations- und Code-Aufwand ist minimal. Ein protokollunabhängiges Transaktionsmodell und ein formaler Ansatz zur Beschreibung von Abstraktionsebenen werden vorgestellt, mit denen verschiedenartige Busse und Networks-on-Chip wie IBM CoreConnect und PCI Express auf verschiedenen TLM-Abstraktionsebenen simuliert werden können. Ein modulares Simulationsframework für SystemC wird entwickelt (GreenBus), um die vorgeschlagenen Konzepte zu untersuchen. Anhand von GreenBus werden neue Implementierungstechniken diskutiert, die den existierenden Ansätzen in Flexibilität, Simulationsgenauigkeit und -geschwindigkeit überlegen sind. Die Vor- und Nachteile der entwickelten Techniken werden mit Experimenten belegt, und eine Videoprozessor-Fallstudie demonstriert die Effizienz des Ansatzes in einem TLM-basierten Entwurfsfluss

    Gradual Synchronization

    Full text link
    A synchronization solution is developed in order to allow finer grained segmentation of clock domains on a chip. This solution incorporates computation into the synchronization overhead time and is called Gradual Synchronization. With Gradual Synchronization as a synchronization method the design space of a chip could easily mix both asynchronous and synchronous blocks of logic, paving the way for wider use of asynchronous logic design
    corecore