7 research outputs found

    A Reactive and Cycle-True IP Emulator for MPSoC Exploration

    Get PDF
    The design of MultiProcessor Systems-on-Chip (MPSoC) emphasizes intellectual-property (IP)-based communication-centric approaches. Therefore, for the optimization of the MPSoC interconnect, the designer must develop traffic models that realistically capture the application behavior as executing on the IP core. In this paper, we introduce a Reactive IP Emulator (RIPE) that enables an effective emulation of the IP-core behavior in multiple environments, including bitand cycle-true simulation. The RIPE is built as a multithreaded abstract instruction-set processor, and it can generate reactive traffic patterns. We compare the RIPE models with cycle-true functional simulation of complex application behavior (tasksynchronization, multitasking, and input/output operations). Our results demonstrate high-accuracy and significant speedups. Furthermore, via a case study, we show the potential use of the RIPE in a design-space-exploration context

    Parameterizable network-on-chip emulation framework

    Full text link
    Networks-on-Chip (NoCs) have been proposed as a promising solution to complex on-chip communication problems. But there is no public accessible HDL synthesizable NoC framework which connects industrial level cores and runs real applications on them. Moreover, many challenging research problems remain unsolved at all levels of design abstraction; design exploration of NoC architecture for applications, scheduling and mapping algorithms, evaluation of switching, topology or routing algorithm for efficient execution of application and optimizing communication cost, area, energy etc Solution to solve the above problem calls for the development of synthesizable, parameterizable NoC Framework that would evaluate and implement the above outstanding research problems and algorithms with minimum ease and flexibility. The proposed NoC Framework has been used to specifically evaluate the following algorithms or variations in architecture: i) Evaluate Switching Algorithms compare latency, congestion, area and power of Wormhole (WH) and Store and Forward (SF) switching, ii) Efficient Router Architecture: Proposed an efficient Virtual Channel architecture with loopback for SF routing is introduced to improve throughput, latency and area, iii) Static routing algorithm: Proposed a simple and efficient routing algorithm called “Mirror Routing” for Torus architectures. This helps in reducing congestion and the routing algorithm is also deadlock free, iv) Adaptive Routing Algorithm: Proposed and evaluated an adaptive routing algorithm for WK topology. The simulation results show Wormhole Routing with better latency than Store and Forward. Area and Power usage is also relatively less for Wormhole Routing. Study on different traffic scenarios with different Virtual Channel architectures in Store and Forward routing shows considerable improvement in latency in Virtual Channel architecture with loopback. Also it is proved that the proposed Mirror Routing algorithm is able to handle a single congestion or fault in routing path. The latency increases with increase in size of Torus structure. The Adaptive routing algorithm proposed for WK Topology results in increase in latency but can be considered in scenarios where the receiver node at the congested link is comparatively slow or when the fault in link is permanent

    Design of complex integrated systems based on networks-on-chip: Trading off performance, power and reliability

    Get PDF
    The steady advancement of microelectronics is associated with an escalating number of challenges for design engineers due to both the tiny dimensions and the enormous complexity of integrated systems. Against this background, this work deals with Network-On-Chip (NOC) as the emerging design paradigm to cope with diverse issues of nanotechnology. The detailed investigations within the chapters focus on the communication-centric aspects of multi-core-systems, whereas performance, power consumption as well as reliability are considered likewise as the essential design criteria

    Rede intra-chip com previsibilidade de latência para uso em sistemas de tempo real

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2015.Sistemas intra-chip ou SoC (acrônimo de Systems-on-Chip) com múltiplas unidades de processamento heterogêneas têm sido usados pela indústria de silício como solução para disponibilizar o desempenho demandado pelas modernas aplicações multimídia. No entanto, a integração de um crescente número de unidades de processamento especializadas em um mesmo SoC impõem um desafio para os mecanismos de interconexão de tais sistemas, que agora são obrigados a lidar com um grande número de fluxos de comunicação muito distintos, com requisitos de latência e largura de banda também muito distintos. Como solução, a indústria do silício vem utilizando redes intra-chip ou NoCs (acrônimo de Networks-on-Chip) com previsibilidade de latência para interligar tais unidades de processamento neste tipo de SoC. No entanto, muitas aplicações neste domínio obteriam mais benefícios de uma NoC que pudesse otimizar a utilização dos recursos para fluxos multimídia que toleram variações razoáveis na Qualidade de Serviço (em inglês Quality of Service - QoS). Será demonstrado ao longo deste documento que muitos destes sistemas são concebidos em torno de alguns fluxos de comunicações de tempo real muito restritos, que precisam ser tratados dentro de limites de tempo rigorosos (muitas vezes envolvendo comandos para o controle do sistema ou tarefas de sinalização de estado do sistema) e um grande número de fluxos multimídia menos restritos, que toleram variações muito maiores na latência e na largura de banda. A estratégia de projeto de NoCs predominante na literatura para produzir interconexões para SoCs de tempo real baseia-se no mapeamento dos requisitos de comunicação de tarefas em tempo real (por vezes implementadas em hardware como componentes de propriedade intelectual dedicados) para os recursos de rede disponíveis em fases iniciais do projeto. Este mapeamento, no entanto, muitas vezes é realizado considerando um cenário de pior caso e, portanto, resulta em reserva de recursos que poderiam ser dinamicamente realocados para outros fluxos. Embora adequado para aplicações críticas de tempo real, esta estratégia resulta na má utilização de silício para aplicações multimídia com taxa de bits variável. Neste contexto, esta Tese apresenta uma rede que oferece previsibilidade na latência de pior caso, denominada de RTSNoC, e que foi projetada para o cenário no qual o sistema possui poucos fluxos de comunicação com restrições de tempo real rígidas, relacionados ao controle do sistema, e muitos fluxos de comunicação multimídia com restrições de tempo real menos rígidas. Na verdade, uma latência de pior caso para tais fluxos multimídia pode ser determinado em tempo de projeto, de modo que os projetistas poderiam de fato modelar os fluxos de multimídia como sendo de tempo real suave (ou soft real-time), cuja degradação é proporcional à quantidade de fluxos flui ao longo da rede. No entanto, uma vez que a estratégia de roteamento adotada na RTSNoC não usa qualquer tipo de reserva de recursos em tempo de execução, neste documento tais fluxos serão designados como sendo ?fluxos de melhor esforço? (em inglês Best Effort- BE). A arquitetura da rede proposta baseia-se na intercalação de flits provenientes de diferentes fluxos em um mesmo canal de comunicação entre roteadores da rede, de modo que cada flit contém informações de roteamento. Os resultados experimentais demonstram que a latência média de fluxos com variação na taxa de bits injetados na rede proposta é, em média, mais baixa do que em redes que executam a reserva de recursos e estão operando com 80% de tráfego oferecido. Além disso, é demonstrado analiticamente que fluxos de comunicação de tempo real projetados considerando o valor da latência de pior caso da rede sempre atenderão as restrições associadas a tarefas de tempo real rígidas, de modo que não há perda no limite de tempo para a execução de tais tarefas devido à contenção de recursos na rede.Abstract : Systems-on-Chip (SoC) with multiple heterogeneous processing unitshave been used by the silicon industry as means to deliver the performancerequired by modern multimedia applications. However, theintegration of an increasing number of specialized processing units posesa challenge on the interconnection mechanisms in such systems,which are now required to handle a large number of very distinctivecommunication ows, with very distinct latency and bandwidth requirements.As a solution, the silicon industry has been using predictableNetworks-on-Chip (NoC) to interconnect components in this kind ofSoC. Nevertheless, many applications in this domain would prot betterfrom a NoC that could optimize the utilization of resources formultimedia ows that tolerate reasonable variations in the Qualityof-Service(QoS). In this document will be shown that several systemshave been conceived around a few very strict real-time communicationows (often involving control or signalling tasks) and a large numberof less strict multimedia ows that tolerate much larger variations inlatency and bandwidth. In this context, current real-time NoC designsfall short at making good use of hardware resources as they rely onworst-case resource reservation. The prevailing design strategy to produceinterconnects for such SoCs relies on mapping the communicationrequirements of real-time tasks (sometimes implemented in hardwareas dedicated IPs) to available network resources at early design stages.This mapping, however, is often performed considering a worst-casescenario and therefore results in the reservation of resources that couldotherwise by dynamically reallocated to other ows. Although adequatefor critical real-time applications, this strategy results in poorsilicon utilization for variable-bit-rate multimedia applications. Thisdocument presents a Worst-Case Latency (WCL) of a network calledRTSNoC that was designed with the aforementioned scenario in mind:few hard real-time control ows and many best-eort multimedia ows.Indeed, a worst-case latency for such best-eort ows can be determinedat design-time, so designers could indeed model the multimediaows as soft real-time (or QoS) ows whose degradation is proportionalto the amount of streams owing across the chip. However, sincethe routing strategy does not use any kind of resource reservation atrun-time, this document will refers to those ows as best-eort. Theproposed NoC architecture is based on the interleaving of its fromdierent ows in the same communication channel between routers, soeach its carries along routing information. Experimental result showedthat the worst-case latency in RTSNoC network was, in average, lowerthan NoC that adopt resources reservation, when those networks areworking over 80% of oered load. Furthermore, it was analytically demonstratedthat the communication ows related to real-time designedconsidering the worst-case latency of the network always will achievethe restrictions related to hard real-time tasks. It means that there isno deadline lost for the execution of those tasks due to the contentionof network resources

    Analyses statistiques des communications sur puce

    Get PDF
    This PhD is composed of two main parts. The first one focuses on Internet traffic modelling. From the analysis of many traffic traces, we have proposed a parsimonious model (Gamma-Farima) adapted to aggregated throughput traces and valid for wide range of aggregation levels. In order to produce synthetic traffic from this model, we have also studied the generation of sample path of non-gaussian and long memory stochastic processes. We have then used the Gamma-Farima model in order to build an anomaly detection method. To this end we have introduced a multiresolution model that can differentiate a regular traffic from a malicious one (including a DDoS attack). This method was evaluated both on real traces and simulations. Finally, we have studied the production of long range dependent traffic in a network simulator (NS-2). The second part of this PhD deals with the analysis and synthesis of on-chip traffic, i.e. the traffic occurring in a system on chip. In such systems, the introduction of networks on chip (NOC) has brought the interconnection system on top of the design flow. In order to prototype these NOC rapidly, fast simulations need to be done, and replacing the components by traffic generators is a good way to achieve this purpose. So, we have set up and developed a complete and flexible on-chip traffic generation environment that is able to replay a previously recorded trace, to generate a random load on the network, to produce a stochastic traffic fitted to a reference trace and to take into account traffic phases. Indeed most of the traffic traces we have obtained were non-stationary, we therefore need to split them into reasonably stationary parts in order to perform a meaningful stochastic fit. We have performed many experiments in the SOCLIB simulation environment that demonstrate that i) our traffic generation procedure is correct, ii) our segmentation algorithm provides promising results and iii) multiphase stochastic traffic generation is a good tradeoff between replay and simple random traffic generation. Finally, we have investigated the presence of long memory in the trace as well as the impact of long memory on the NoC performance.Cette thèse est composée de deux parties. La première explore la problématique de la modélisation de trafic Internet. Nous avons proposé, à partir de l'étude de nombreuses traces, un modèle basé sur des processus stochastiques non-gaussiens à longue mémoire (Gamma-Farima) permettant de modéliser de manière pertinente les traces de débit agrégé, et ce pour une large gamme de niveau d'agrégation. Afin de pouvoir générer du trafic synthétique, nous avons proposé une méthode de synthèse de tels processus. Nous avons ensuite, à partir du modèle Gamma-Farima, proposé un modèle multirésolution permettant de différencier un trafic régulier, d'un trafic contenant une attaque (de type déni de service distribuée). Ceci nous a permis de proposer une méthode de détection d'anomalie que nous avons évalué sur des traces réelles et en simulation. Enfin nous avons étudié expérimentalement le problème de la production de trafic à longue mémoire dans un simulateur de réseaux (NS-2). La deuxième partie traite la problématique de la génération de trafic au sein des systèmes sur puce (SOC). Dans ce domaine, l'arrivée de véritable réseaux sur puce place la conception de l'interconnexion au premier plan, et pour accélérer les simulations, il convient de remplacer les composants par des générateurs de trafic. Nous avons mis en place un environnement complet de génération de trafic sur puce permettant de rejouer une trace, de produire une charge aléatoire sur le réseau, de produire un trafic stochastique ajusté sur une trace de référence et de tenir compte des phases dans le trafic. Nous avons effectué de nombreuses simulations dans l'environnement de simulation de SOC académique SOCLIB qui nous ont permis de valider notre approche, d'évaluer notre algorithme de segmentation ainsi que la génération de trafic stochastique multiphase que nous avons introduite. Nous avons aussi exploré la présence de longue mémoire dans le trafic des processeurs sur puce, ainsi que l'impact de cette caractéristique sur les performances du réseau sur puce
    corecore