377 research outputs found

    NoCo: ILP-based worst-case contention estimation for mesh real-time manycores

    Get PDF
    Manycores are capable of providing the computational demands required by functionally-advanced critical applications in domains such as automotive and avionics. In manycores a network-on-chip (NoC) provides access to shared caches and memories and hence concentrates most of the contention that tasks suffer, with effects on the worst-case contention delay (WCD) of packets and tasks' WCET. While several proposals minimize the impact of individual NoC parameters on WCD, e.g. mapping and routing, there are strong dependences among these NoC parameters. Hence, finding the optimal NoC configurations requires optimizing all parameters simultaneously, which represents a multidimensional optimization problem. In this paper we propose NoCo, a novel approach that combines ILP and stochastic optimization to find NoC configurations in terms of packet routing, application mapping, and arbitration weight allocation. Our results show that NoCo improves other techniques that optimize a subset of NoC parameters.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015- 65316-P and the HiPEAC Network of Excellence. It also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (agreement No. 772773). Carles Hernández is jointly supported by the MINECO and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been partially supported by the Spanish Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Enrico Mezzetti has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva-Incorporaci´on postdoctoral fellowship number IJCI-2016-27396.Peer ReviewedPostprint (author's final draft

    Enforcing Predictability of Many-cores with DCFNoC

    Get PDF
    © 2021 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] The ever need for higher performance forces industry to include technology based on multi-processors system on chip (MPSoCs) in their safety-critical embedded systems. MPSoCs include a network-on-chip (NoC) to interconnect the cores between them and with memory and the rest of shared resources. Unfortunately, the inclusion of NoCs compromises guaranteeing time predictability as network-level conflicts may occur. To overcome this problem, in this paper we propose DCFNoC, a new time-predictable NoC design paradigm where conflicts within the network are eliminated by design. This new paradigm builds on top of the Channel Dependency Graph (CDG) in order to deterministically avoid network conflicts. The network guarantees predictability to applications and is able to naturally inject messages using a TDM period equal to the optimal theoretical bound without the need of using a computationally demanding offline process. DCFNoC is integrated in a tile-based many-core system and adapted to its memory hierarchy. Our results show that DCFNoC guarantees time predictability avoiding network interference among multiple running applications. DCFNoC always guarantees performance and also improves wormhole performance in a 4 × 4 setting by a factor of 3.7× when interference traffic is injected. For a 8 × 8 network differences are even larger. In addition, DCFNoC obtains a total area saving of 10.79% over a standard wormhole implementation.This work has been supported by MINECO under Grant BES-2016-076885, by MINECO and funds from the European ERDF under Grant TIN2015-66972-C05-1-R and Grant RTI2018-098156-B-C51, and by the EC H2020 RECIPE project under Grant 801137.Picornell-Sanjuan, T.; Flich Cardo, J.; Hernández Luz, C.; Duato Marín, JF. (2021). Enforcing Predictability of Many-cores with DCFNoC. IEEE Transactions on Computers. 70(2):270-283. https://doi.org/10.1109/TC.2020.2987797S27028370

    Automatic synthesis and optimization of chip multiprocessors

    Get PDF
    The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed

    Conflict-Free Networks on Chip for Real Time Systems

    Full text link
    [ES] La constante necesidad de un mayor rendimiento para cumplir con la gran demanda de potencia de cómputo de las nuevas aplicaciones, (ej. sistemas de conducción autónoma), obliga a la industria a apostar por la tecnología basada en Sistemas en Chip con Procesadores Multinúcleo (MPSoCs) en sus sistemas embebidos de seguridad-crítica. Los sistemas MPSoCs generalmente incluyen una red en el chip (NoC) para interconectar los núcleos de procesamiento entre ellos, con la memoria y con el resto de recursos compartidos. Desafortunadamente, el uso de las NoCs dificulta alcanzar la predecibilidad en el tiempo, ya que pueden aparecer conflictos en muchos puntos y de forma distribuida a nivel de red. Para afrontar este problema, en esta tesis se propone un nuevo paradigma de diseño para NoCs de tiempo real donde los conflictos en la red son eliminados por diseño. Este nuevo paradigma parte del Grafo de Dependencia de Canales (CDG) para evitar los conflictos de red de forma determinista. Nuestra solución es capaz de inyectar mensajes de forma natural usando un periodo TDM igual al límite teórico óptimo sin la necesidad de usar un proceso offline exigente computacionalmente. La red se ha integrado en un sistema multinúcleo basado en tiles y adaptado a su jerarquía de memoria. Como segunda contribución principal, proponemos un nuevo planificador dinámico y distribuido capaz de alcanzar un rendimiento pico muy cercanos a las NoC basadas en un diseño wormhole sin comprometer sus garantías de tiempo real. El planificador se basa en nuestro diseño de red para explotar sus propiedades clave. Los resultados de nuestra NoC muestran que nuestro diseño garantiza la predecibilidad en el tiempo evitando interferencias en la red entre múltiples aplicaciones ejecutándose concurrentemente. La red siempre garantiza el rendimiento y también mejora el rendimiento respecto al de las redes wormhole en una red 4 x 4 en un factor de 3,7x cuando se inyecta trafico para generar interferencias. En una red 8 x 8 las diferencias son incluso mayores. Además, la red obtiene un ahorro de área total del 10,79% frente a una implementación básica de una red wormhole. El planificador propuesto alcanza una mejora de rendimiento de 6,9x y 14,4x frente la versión básica de la red DCFNoC para redes en forma de malla de 16 y 64 nodos, respectivamente. Cuando lo comparamos frente a un conmutador estándar wormhole se preserva un rendimiento de red del 95% al mismo tiempo que preserva la estricta predecibilidad en el tiempo. Este logro abre la puerta a nuevos diseños de NoCs de alto rendimiento con predecibilidad en el tiempo. Como contribución final, construimos una taxonomía de NoCs basadas en TDM con propiedades de tiempo real. Con esta taxonomía realizamos un análisis exhaustivo para estudiar y comparar desde tiempos de respuesta, a implementaciones con bajo coste, pasando por soluciones de compromiso para diseños de NoCs de tiempo real. Como resultado, obtenemos nuevos diseños de NoCs basadas en TDM.[CA] La constant necessitat d'un major rendiment per a complir amb la gran demanda de potència de còmput de les noves aplicacions, (ex. sistemes de conducció autònoma), obliga la indústria a apostar per la tecnologia basada en Sistemes en Xip amb Processadors Multinucli (MPSoCs) en els seus sistemes embeguts de seguretat-crítica. Els sistemes MPSoCs generalment inclouen una xarxa en el xip (NoC) per a interconnectar els nuclis de processament entre ells, amb la memòria i amb la resta de recursos compartits. Desafortunadament, l'ús de les NoCs dificulta aconseguir la predictibilitat en el temps, ja que poden aparéixer conflictes en molts punts i de forma distribuïda a nivell de xarxa. Per a afrontar aquest problema, en aquesta tesi es proposa un nou paradigma de disseny per a NoCs de temps real on els conflictes en la xarxa són eliminats per disseny. Aquest nou paradigma parteix del Graf de Dependència de Canals (CDG) per a evitar els conflictes de xarxa de manera determinista. La nostra solució és capaç d'injectar missatges de mra natural fent ús d'un període TDM igual al límit teòric òptim sense la necessitat de fer ús d'un procés offline exigent computacionalment. La xarxa s'ha integrat en un sistema multinucli basat en tiles i adaptat a la seua jerarquia de memòria. Com a segona contribució principal, proposem un nou planificador dinàmic i distribuït capaç d'aconseguir un rendiment pic molt pròxims a les NoC basades en un disseny wormhole sense comprometre les seues garanties de temps real. El planificador es basa en el nostre disseny de xarxa per a explotar les seues propietats clau. Els resultats de la nostra NoC mostren que el nostre disseny garanteix la predictibilitat en el temps evitant interferències en la xarxa entre múltiples aplicacions executant-se concurrentment. La xarxa sempre garanteix el rendiment i també millora el rendiment respecte al de les xarxes wormhole en una xarxa 4 x 4 en un factor de 3,7x quan s'injecta trafic per a generar interferències. En una xarxa 8 x 8 les diferències són fins i tot majors. A més, la xarxa obté un estalvi d'àrea total del 10,79% front una implementació bàsica d'una xarxa wormhole. El planificador proposat aconsegueix una millora de rendiment de 6,9x i 14,4x front la versió bàsica de la xarxa DCFNoC per a xarxes en forma de malla de 16 i 64 nodes, respectivament. Quan ho comparem amb un commutador estàndard wormhole es preserva un rendiment de xarxa del 95% al mateix temps que preserva la estricta predictibilitat en el temps. Aquest assoliment obri la porta a nous dissenys de NoCs d'alt rendiment amb predictibilitat en el temps. Com a contribució final, construïm una taxonomia de NoCs basades en TDM amb propietats de temps real. Amb aquesta taxonomia realitzem una anàlisi exhaustiu per a estudiar i comparar des de temps de resposta, a implementacions amb baix cost, passant per solucions de compromís per a dissenys de NoCs de temps real. Com a resultat, obtenim nous dissenys de NoCs basades en TDM.[EN] The ever need for higher performance to cope with the high computational power demands of new applications (e.g autonomous driving systems), forces industry to support technology based on multi-processors system on chip (MPSoCs) in their safety-critical embedded systems. MPSoCs usually include a network-on-chip (NoC) to interconnect the cores between them and, with memory and the rest of shared resources. Unfortunately, the inclusion of NoCs difficults achieving time predictability as network-level conflicts may occur in many points in a distributed manner. To overcome this problem, this thesis proposes a new time-predictable NoC design paradigm where conflicts within the network are eliminated by design. This new paradigm builds on top of the Channel Dependency Graph (CDG) in order to deterministically avoid network conflicts. Our solution is able to naturally inject messages using a TDM period equal to the optimal theoretical bound without the need of using a computationally demanding offline process. The network is integrated in a tile-based manycore system and adapted to its memory hierarchy. As a second main contribution, we propose a novel distributed dynamic scheduler that is able to achieve peak performance close to a wormhole-based NoC design without compromising its real-time guarantees. The scheduler builds on top of our NoC design to exploit its key properties. The results of our NoC show that our design guarantees time predictability avoiding network interference among multiple running applications. The network always guarantees performance and also improves wormhole performance in a 4 x 4 setting by a factor of 3.7x when interference traffic is injected. For a 8 x 8 network differences are even larger. In addition, the network obtains a total area saving of 10.79% over a standard wormhole implementation. The proposed scheduler achieves an overall throughput improvement of 6.9x and 14.4x over a baseline conflict-free NoC for 16 and 64-node meshes, respectively. When compared against a standard wormhole router 95% of its network throughput is preserved while strict timing predictability is kept. This achievement opens the door to new high performance time predictable NoC designs. As a final contribution, we build a taxonomy of TDM-based NoCs with real-time properties. With this taxonomy we perform a comprehensive analysis to study and compare from response time specific, to low resource implementation cost, through trade-off solutions for real-time NoCs designs. As a result, we derive new TDM-based NoC designs.Picornell Sanjuan, T. (2021). Conflict-Free Networks on Chip for Real Time Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/177347TESI

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Design and Implementation of High QoS 3D-NoC using Modified Double Particle Swarm Optimization on FPGA

    Get PDF
    One technique to overcome the exponential growth bottleneck is to increase the number of cores on a processor, although having too many cores might cause issues including chip overheating and communication blockage. The problem of the communication bottleneck on the chip is presently effectively resolved by networks-on-chip (NoC). A 3D stack of chips is now possible, thanks to recent developments in IC manufacturing techniques, enabling to reduce of chip area while increasing chip throughput and reducing power consumption. The automated process associated with mapping applications to form three-dimensional NoC architectures is a significant new path in 3D NoC research. This work proposes a 3D NoC partitioning approach that can identify the 3D NoC region that has to be mapped. A double particle swarm optimization (DPSO) inspired algorithmic technique, which may combine the characteristics having neighbourhood search and genetic architectures, also addresses the challenge of a particle swarm algorithm descending into local optimal solutions. Experimental evidence supports the claim that this hybrid optimization algorithm based on Double Particle Swarm Optimisation outperforms the conventional heuristic technique in terms of output rate and loss in energy. The findings demonstrate that in a network of the same size, the newly introduced router delivers the lowest loss on the longest path.  Three factors, namely energy, latency or delay, and throughput, are compared between the suggested 3D mesh ONoC and its 2D version. When comparing power consumption between 3D ONoC and its electronic and 2D equivalents, which both have 512 IP cores, it may save roughly 79.9% of the energy used by the electronic counterpart and 24.3% of the energy used by the latter. The network efficiency of the 3D mesh ONoC is simulated by DPSO in a variety of configurations. The outcomes also demonstrate an increase in performance over the 2D ONoC. As a flexible communication solution, Network-On-Chips (NoCs) have been frequently employed in the development of multiprocessor system-on-chips (MPSoCs). By outsourcing their communication activities, NoCs permit on-chip Intellectual Property (IP) cores to communicate with one another and function at a better level. The important components in assigning application duties, distributing the work to the IPs, and coordinating communication among them are mapping and scheduling methods. This study aims to present an entirely advanced form of research in the area of 3D NoC mapping and scheduling applications, grouping the results according to various parameters and offering several suggestions for further research

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    Design Methods and Tools for Application-Specific Predictable Networks-on-Chip

    Get PDF
    As the complexity of applications grows with each new generation, so does the demand for computation power. To satisfy the computation demands at manageable power levels, we see a shift in the design paradigm from single processor systems to Multiprocessor Systems-on-Chip (MPSoCs). MPSoCs leverage the parallelism in applications to increase the performance at the same power levels. To further improve the computation to power consumption ratio, MPSoCs for embedded applications are heterogeneous and integrate cores that are specialized to perform the different functionalities of the application. With technology scaling, wire power consumption is increasing compared to logic, making communication as expensive as computation. Therefore customizing the interconnect is necessary to achieve energy efficiency. Designing an optimal application specific Network-on-Chip (NoC), that meets application demands, requires the exploration of a large design space. Automatic design and optimization of the NoC is required in order to achieve fast design closure, especially for heterogeneous MPSoCs. To continue to meet the computation requirements of future applications new technologies are emerging. Three dimensional integration promises to increase the number of transistors by stacking multiple silicon layers. This will lead to an increase in the number of cores of the MPSoCs resulting in increased communication demands. To compensate for the increase in the wire delay in new technology nodes as well as to reduce the power consumption further, multi-synchronous design is becoming popular. With multiple clock signals, different parts of the MPSoC can be clocked at different frequencies according to the current demands of the application and can even be shutdown when they are not used at all. This further complicates the design of the NoC.Many applications require different levels of guarantee from the NoC in order to perform their functionality correctly. As communication traffic patterns become more complex, the performance of the NoC can no longer be predicted statically. Therefore designing the interconnect network requires that such guarantees are provided during the dynamic operation of the system which includes the interaction with major subsystems (i.e., main memory) and not just the interconnect itself. In this thesis, I present novel methods to design application-specific NoCs that meet performance demands, under the constraints of new technologies. To provide different levels of Quality of Service, I integrate methods to estimate the NoC performance during the design phase of the interconnect topology. I present methods and architectures for NoCs to efficiently access memory systems, in order to achieve predictable operation of the systems from the point of view of the communication as well as the bottleneck target devices. Therefore the main contribution of the thesis is twofold: scientific as I propose new algorithms to perform topology synthesis and engineering by presenting extensive experiments and architectures for NoC design
    • …
    corecore