92 research outputs found

    NoCo: ILP-based worst-case contention estimation for mesh real-time manycores

    Get PDF
    Manycores are capable of providing the computational demands required by functionally-advanced critical applications in domains such as automotive and avionics. In manycores a network-on-chip (NoC) provides access to shared caches and memories and hence concentrates most of the contention that tasks suffer, with effects on the worst-case contention delay (WCD) of packets and tasks' WCET. While several proposals minimize the impact of individual NoC parameters on WCD, e.g. mapping and routing, there are strong dependences among these NoC parameters. Hence, finding the optimal NoC configurations requires optimizing all parameters simultaneously, which represents a multidimensional optimization problem. In this paper we propose NoCo, a novel approach that combines ILP and stochastic optimization to find NoC configurations in terms of packet routing, application mapping, and arbitration weight allocation. Our results show that NoCo improves other techniques that optimize a subset of NoC parameters.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015- 65316-P and the HiPEAC Network of Excellence. It also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (agreement No. 772773). Carles Hernández is jointly supported by the MINECO and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been partially supported by the Spanish Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Enrico Mezzetti has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva-Incorporaci´on postdoctoral fellowship number IJCI-2016-27396.Peer ReviewedPostprint (author's final draft

    Run-time management of many-core SoCs: A communication-centric approach

    Get PDF
    The single core performance hit the power and complexity limits in the beginning of this century, moving the industry towards the design of multi- and many-core system-on-chips (SoCs). The on-chip communication between the cores plays a criticalrole in the performance of these SoCs, with power dissipation, communication latency, scalability to many cores, and reliability against the transistor failures as the main design challenges. Accordingly, we dedicate this thesis to the communicationcentered management of the many-core SoCs, with the goal to advance the state-ofthe-art in addressing these challenges. To this end, we contribute to on-chip communication of many-core SoCs in three main directions. First, we start with a synthesizable SoC with full system simulation. We demonstrate the importance of the networking overhead in a practical system, and propose our sophisticated network interface (NI) that offloads the work from SW to HW. Our results show around 5x and up to 50x higher network performance, compared to previous works. As the second direction of this thesis, we study the significance of run-time application mapping. We demonstrate that contiguous application mapping not only improves the network latency (by 23%) and power dissipation (by 50%), but also improves the system throughput (by 3%) and quality-of-service (QoS) of soft real-time applications (up to 100x less deadline misses). Also our hierarchical run-time application mapping provides 99.41% successful mapping when up to 8 links are broken. As the final direction of the thesis, we propose a fault-tolerant routing algorithm, the maze-routing. It is the first-in-class algorithm that provides guaranteed delivery, a fully-distributed solution, low area overhead (by 16x), and instantaneous reconfiguration (vs. 40K cycles down time of previous works), all at the same time. Besides the individual goals of each contribution, when applicable, we ensure that our solutions scale to extreme network sizes like 12x12 and 16x16. This thesis concludes that the communication overhead and its optimization play a significant role in the performance of many-core SoC

    A Brief Comment on "A Complete Self-Testing and Self-Configuring NoC Infrastructure for Cost-Effective MPSoCs"

    Full text link
    © ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Embedded Computing Systems, Vol. 14, No. 1, Article 2, january 2015. http://doi.acm.org/10.1145/2668121[EN] In the Ghiribaldi et al. [2013] paper, a complete self-testing and self configuring NoC infrastructure for cost-effective MPSoCs was presented in order to make NoC architecture tolerant to faults. To overcome the complexity involved during the complete reconfiguration of routing instances in the face of most of the usual failure patterns, Ghiribaldi et al. [2013] proposed a fast self-reconfiguration algorithm. The algorithm is based on segment-based routing implemented using Logic-Based Distributed Routing (LBDR) and claimed to have handled the most common NoC faults. The purpose of this comment is to demonstrate the inconsistency of the fast self-configuration method presented in Ghiribaldi et al. [2013]. To handle inconsistency, we present the correct set of LBDR bits and also argue that complete reconfiguration of the routing instance is mandatory to handle some fault combinations. New coverage results of the fast self-reconfiguration algorithm of Ghiribaldi et al. [2013] are also presented.This work is supported by Indo-Spain DST project under grant DST/INT/Spain/P35/11/1 and Spanish Ministerio de Economa y Competitividad (MINECO) under grant PRI-PIBIN-2011-0989Bishnoi, R.; Laxmi, V.; Gaur, MS.; Flich Cardo, J.; Trivino, F. (2015). A Brief Comment on "A Complete Self-Testing and Self-Configuring NoC Infrastructure for Cost-Effective MPSoCs". ACM Transactions in Embedded Computing Systems. 14(1):1-9. https://doi.org/10.1145/2668121S19141A. Ghiribaldi, D. Ludovici, F. Triviño, A. Strano, J. Flich, J. L. Sánchez, F. Alfaro, M. Favalli, and D. Bertozzi. 2013. A complete self-testing and self-configuring NoC infrastructure for cost-effective MPSoCs. ACM Trans. Embed. Comput. Syst. 12, 4 (July 2013), 106:1--106:29. DOI: http://dx.doi.org/10.1145/2485984.2485994A. Mejia. 2008. Design and Implementation of Efficient Topology Agnostic Routing Algorithms for Interconnection Networks. Ph.D. Dissertation. University of Valencia.A. Mejia, J. Flich, and J. Duato. 2008. On the potentials of segment-based routing for NoCs. In Proceedings of the 37th International Conference on Parallel Processing (ICPP’08). IEEE, 594--603. DOI: http://dx.doi.org/10.1109/ICPP.2008.56S. Rodrigo, S. Medardoni, J. Flich, D. Bertozzi, and J. Duato. 2009. Efficient implementation of distributed routing algorithms for NoCs. IET Comput. Digital Techn. 3, 5 (2009), 460--475. DOI: http://dx.doi.org/10.1049/iet-cdt.2008.0092A. Strano, D. Bertozzi, F. Trivino, J. L. Sanchez, F. J. Alfaro, and J. Flich. 2012. OSR-Lite: Fast and deadlock-free NoC reconfiguration framework. In Proceedings of the International Conference on Embedded Computer Systems (SAMOS’12). 86--95. DOI: http://dx.doi.org/10.1109/SAMOS.2012.640416

    Conflict-Free Networks on Chip for Real Time Systems

    Full text link
    [ES] La constante necesidad de un mayor rendimiento para cumplir con la gran demanda de potencia de cómputo de las nuevas aplicaciones, (ej. sistemas de conducción autónoma), obliga a la industria a apostar por la tecnología basada en Sistemas en Chip con Procesadores Multinúcleo (MPSoCs) en sus sistemas embebidos de seguridad-crítica. Los sistemas MPSoCs generalmente incluyen una red en el chip (NoC) para interconectar los núcleos de procesamiento entre ellos, con la memoria y con el resto de recursos compartidos. Desafortunadamente, el uso de las NoCs dificulta alcanzar la predecibilidad en el tiempo, ya que pueden aparecer conflictos en muchos puntos y de forma distribuida a nivel de red. Para afrontar este problema, en esta tesis se propone un nuevo paradigma de diseño para NoCs de tiempo real donde los conflictos en la red son eliminados por diseño. Este nuevo paradigma parte del Grafo de Dependencia de Canales (CDG) para evitar los conflictos de red de forma determinista. Nuestra solución es capaz de inyectar mensajes de forma natural usando un periodo TDM igual al límite teórico óptimo sin la necesidad de usar un proceso offline exigente computacionalmente. La red se ha integrado en un sistema multinúcleo basado en tiles y adaptado a su jerarquía de memoria. Como segunda contribución principal, proponemos un nuevo planificador dinámico y distribuido capaz de alcanzar un rendimiento pico muy cercanos a las NoC basadas en un diseño wormhole sin comprometer sus garantías de tiempo real. El planificador se basa en nuestro diseño de red para explotar sus propiedades clave. Los resultados de nuestra NoC muestran que nuestro diseño garantiza la predecibilidad en el tiempo evitando interferencias en la red entre múltiples aplicaciones ejecutándose concurrentemente. La red siempre garantiza el rendimiento y también mejora el rendimiento respecto al de las redes wormhole en una red 4 x 4 en un factor de 3,7x cuando se inyecta trafico para generar interferencias. En una red 8 x 8 las diferencias son incluso mayores. Además, la red obtiene un ahorro de área total del 10,79% frente a una implementación básica de una red wormhole. El planificador propuesto alcanza una mejora de rendimiento de 6,9x y 14,4x frente la versión básica de la red DCFNoC para redes en forma de malla de 16 y 64 nodos, respectivamente. Cuando lo comparamos frente a un conmutador estándar wormhole se preserva un rendimiento de red del 95% al mismo tiempo que preserva la estricta predecibilidad en el tiempo. Este logro abre la puerta a nuevos diseños de NoCs de alto rendimiento con predecibilidad en el tiempo. Como contribución final, construimos una taxonomía de NoCs basadas en TDM con propiedades de tiempo real. Con esta taxonomía realizamos un análisis exhaustivo para estudiar y comparar desde tiempos de respuesta, a implementaciones con bajo coste, pasando por soluciones de compromiso para diseños de NoCs de tiempo real. Como resultado, obtenemos nuevos diseños de NoCs basadas en TDM.[CA] La constant necessitat d'un major rendiment per a complir amb la gran demanda de potència de còmput de les noves aplicacions, (ex. sistemes de conducció autònoma), obliga la indústria a apostar per la tecnologia basada en Sistemes en Xip amb Processadors Multinucli (MPSoCs) en els seus sistemes embeguts de seguretat-crítica. Els sistemes MPSoCs generalment inclouen una xarxa en el xip (NoC) per a interconnectar els nuclis de processament entre ells, amb la memòria i amb la resta de recursos compartits. Desafortunadament, l'ús de les NoCs dificulta aconseguir la predictibilitat en el temps, ja que poden aparéixer conflictes en molts punts i de forma distribuïda a nivell de xarxa. Per a afrontar aquest problema, en aquesta tesi es proposa un nou paradigma de disseny per a NoCs de temps real on els conflictes en la xarxa són eliminats per disseny. Aquest nou paradigma parteix del Graf de Dependència de Canals (CDG) per a evitar els conflictes de xarxa de manera determinista. La nostra solució és capaç d'injectar missatges de mra natural fent ús d'un període TDM igual al límit teòric òptim sense la necessitat de fer ús d'un procés offline exigent computacionalment. La xarxa s'ha integrat en un sistema multinucli basat en tiles i adaptat a la seua jerarquia de memòria. Com a segona contribució principal, proposem un nou planificador dinàmic i distribuït capaç d'aconseguir un rendiment pic molt pròxims a les NoC basades en un disseny wormhole sense comprometre les seues garanties de temps real. El planificador es basa en el nostre disseny de xarxa per a explotar les seues propietats clau. Els resultats de la nostra NoC mostren que el nostre disseny garanteix la predictibilitat en el temps evitant interferències en la xarxa entre múltiples aplicacions executant-se concurrentment. La xarxa sempre garanteix el rendiment i també millora el rendiment respecte al de les xarxes wormhole en una xarxa 4 x 4 en un factor de 3,7x quan s'injecta trafic per a generar interferències. En una xarxa 8 x 8 les diferències són fins i tot majors. A més, la xarxa obté un estalvi d'àrea total del 10,79% front una implementació bàsica d'una xarxa wormhole. El planificador proposat aconsegueix una millora de rendiment de 6,9x i 14,4x front la versió bàsica de la xarxa DCFNoC per a xarxes en forma de malla de 16 i 64 nodes, respectivament. Quan ho comparem amb un commutador estàndard wormhole es preserva un rendiment de xarxa del 95% al mateix temps que preserva la estricta predictibilitat en el temps. Aquest assoliment obri la porta a nous dissenys de NoCs d'alt rendiment amb predictibilitat en el temps. Com a contribució final, construïm una taxonomia de NoCs basades en TDM amb propietats de temps real. Amb aquesta taxonomia realitzem una anàlisi exhaustiu per a estudiar i comparar des de temps de resposta, a implementacions amb baix cost, passant per solucions de compromís per a dissenys de NoCs de temps real. Com a resultat, obtenim nous dissenys de NoCs basades en TDM.[EN] The ever need for higher performance to cope with the high computational power demands of new applications (e.g autonomous driving systems), forces industry to support technology based on multi-processors system on chip (MPSoCs) in their safety-critical embedded systems. MPSoCs usually include a network-on-chip (NoC) to interconnect the cores between them and, with memory and the rest of shared resources. Unfortunately, the inclusion of NoCs difficults achieving time predictability as network-level conflicts may occur in many points in a distributed manner. To overcome this problem, this thesis proposes a new time-predictable NoC design paradigm where conflicts within the network are eliminated by design. This new paradigm builds on top of the Channel Dependency Graph (CDG) in order to deterministically avoid network conflicts. Our solution is able to naturally inject messages using a TDM period equal to the optimal theoretical bound without the need of using a computationally demanding offline process. The network is integrated in a tile-based manycore system and adapted to its memory hierarchy. As a second main contribution, we propose a novel distributed dynamic scheduler that is able to achieve peak performance close to a wormhole-based NoC design without compromising its real-time guarantees. The scheduler builds on top of our NoC design to exploit its key properties. The results of our NoC show that our design guarantees time predictability avoiding network interference among multiple running applications. The network always guarantees performance and also improves wormhole performance in a 4 x 4 setting by a factor of 3.7x when interference traffic is injected. For a 8 x 8 network differences are even larger. In addition, the network obtains a total area saving of 10.79% over a standard wormhole implementation. The proposed scheduler achieves an overall throughput improvement of 6.9x and 14.4x over a baseline conflict-free NoC for 16 and 64-node meshes, respectively. When compared against a standard wormhole router 95% of its network throughput is preserved while strict timing predictability is kept. This achievement opens the door to new high performance time predictable NoC designs. As a final contribution, we build a taxonomy of TDM-based NoCs with real-time properties. With this taxonomy we perform a comprehensive analysis to study and compare from response time specific, to low resource implementation cost, through trade-off solutions for real-time NoCs designs. As a result, we derive new TDM-based NoC designs.Picornell Sanjuan, T. (2021). Conflict-Free Networks on Chip for Real Time Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/177347TESI

    Resilient Routing Implementation in 2D Mesh NoC

    No full text
    With the rapid shrinking of technology and growing integration capacity, the probability of failures in Networks-on-Chip (NoCs) increases and thus, fault tolerance is essential. Moreover, the unpredictable locations of these failures may influence the regularity of the underlying topology, and a regular 2D mesh is likely to become irregular. Thus, for these failure-prone networks, a viable routing framework should comprise a topology-agnostic routing algorithm along with a cost-effective, scalable routing mechanism able to handle failures, irrespective of any particular failure patterns. Existing routing techniques designed to route irregular topologies efficiently lack flexibility (logic-based), scalability (table-based) or relaxed switch design (uLBDR-based). Designing an efficient routing implementation technique to address irregular topologies remains a pressing research problem. To address this, we present a fault resilient routing mechanism for irregular 2D meshes resulting from failures. To handle irregularities, it avoids using routing tables and employs a few fixed configuration bits per switch resulting in a scalable approach. Experiments demonstrate that the proposed approach is guaranteed to tolerate all locations of single and double-link failures and most multiple failures. Also, unlike uLBDR it is not restricted to any particular switching technique and does not replicate any extra messages. Along with fault tolerance, the proposed mechanism can achieve better network performance in fault-free cases. The proposed technique achieves graceful performance degradation during failure. Compared to uLBDR, our method has 14% less area requirements and 16% less overall power consumption

    Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

    Get PDF
    The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

    Design of complex integrated systems based on networks-on-chip: Trading off performance, power and reliability

    Get PDF
    The steady advancement of microelectronics is associated with an escalating number of challenges for design engineers due to both the tiny dimensions and the enormous complexity of integrated systems. Against this background, this work deals with Network-On-Chip (NOC) as the emerging design paradigm to cope with diverse issues of nanotechnology. The detailed investigations within the chapters focus on the communication-centric aspects of multi-core-systems, whereas performance, power consumption as well as reliability are considered likewise as the essential design criteria

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    SB-Router: A Swapped Buffer Activated Low Latency Network-on-Chip Router

    Get PDF
    Switch Allocation (SA) holds a critical stage in Network-on-Chip (NoC) routers, its performance gets affected adversely due to Head-of-Line (HoL) blocking. In traditionally used Input-Queued Routers (IQR), packets are arranged in a particular order in each Virtual Channel (VC). This implementation is vulnerable to HoL blocking, as the switch allocator can allocate only those packets which are available at the head in a VC. In this paper, Swapped Buffer (SB) Router architecture is proposed to schedule packets in input buffers by using SB registers. The VCs are designed as SBs, this allows the packets stored in SB registers along with the head packet of VC to participate in SA. The concept of the SB register minimizes the conflicts in SA and thus reduces HoL blocking, therefore improves the performance of NoC. This paper proposes a priority mechanism to prioritize the non-head packets as compared to head packets in case of conflict between them. Two methods have been proposed in this paper, to enhance the performance of the NoC router. First, a VC allocation technique is proposed to optimize the order of packets in the input buffer. Next, SB-Router is combined with the Fill VC allocation technique to further enhance the performance of NoC routers. The performance of the proposed router is evaluated and the experimental results indicate that our design achieves latency improvement of 68.75% over (Time-Series) TS-Router for uniform traffic at the injection rate of 0.42 flits/cycle for a 64 node mesh network with moderate power consumption and area usage. The performance improvement in packet latency for traces from Princeton Application Repository for Shared-Memory Computers (PARSEC) has also been evaluated. With the achieved reduction in latency, the proposed method has the potential to serve high-speed operations while mapping different applications on multiple core architectures.</p
    • …
    corecore