40 research outputs found

    Embedded dynamic programming networks for networks-on-chip

    Get PDF
    PhD ThesisRelentless technology downscaling and recent technological advancements in three dimensional integrated circuit (3D-IC) provide a promising prospect to realize heterogeneous system-on-chip (SoC) and homogeneous chip multiprocessor (CMP) based on the networks-onchip (NoCs) paradigm with augmented scalability, modularity and performance. In many cases in such systems, scheduling and managing communication resources are the major design and implementation challenges instead of the computing resources. Past research efforts were mainly focused on complex design-time or simple heuristic run-time approaches to deal with the on-chip network resource management with only local or partial information about the network. This could yield poor communication resource utilizations and amortize the benefits of the emerging technologies and design methods. Thus, the provision for efficient run-time resource management in large-scale on-chip systems becomes critical. This thesis proposes a design methodology for a novel run-time resource management infrastructure that can be realized efficiently using a distributed architecture, which closely couples with the distributed NoC infrastructure. The proposed infrastructure exploits the global information and status of the network to optimize and manage the on-chip communication resources at run-time. There are four major contributions in this thesis. First, it presents a novel deadlock detection method that utilizes run-time transitive closure (TC) computation to discover the existence of deadlock-equivalence sets, which imply loops of requests in NoCs. This detection scheme, TC-network, guarantees the discovery of all true-deadlocks without false alarms in contrast to state-of-the-art approximation and heuristic approaches. Second, it investigates the advantages of implementing future on-chip systems using three dimensional (3D) integration and presents the design, fabrication and testing results of a TC-network implemented in a fully stacked three-layer 3D architecture using a through-silicon via (TSV) complementary metal-oxide semiconductor (CMOS) technology. Testing results demonstrate the effectiveness of such a TC-network for deadlock detection with minimal computational delay in a large-scale network. Third, it introduces an adaptive strategy to effectively diffuse heat throughout the three dimensional network-on-chip (3D-NoC) geometry. This strategy employs a dynamic programming technique to select and optimize the direction of data manoeuvre in NoC. It leads to a tool, which is based on the accurate HotSpot thermal model and SystemC cycle accurate model, to simulate the thermal system and evaluate the proposed approach. Fourth, it presents a new dynamic programming-based run-time thermal management (DPRTM) system, including reactive and proactive schemes, to effectively diffuse heat throughout NoC-based CMPs by routing packets through the coolest paths, when the temperature does not exceed chip’s thermal limit. When the thermal limit is exceeded, throttling is employed to mitigate heat in the chip and DPRTM changes its course to avoid throttled paths and to minimize the impact of throttling on chip performance. This thesis enables a new avenue to explore a novel run-time resource management infrastructure for NoCs, in which new methodologies and concepts are proposed to enhance the on-chip networks for future large-scale 3D integration.Iraqi Ministry of Higher Education and Scientific Research (MOHESR)

    Efficient bypass mechanisms for low latency networks on-chip

    Get PDF
    RESUMEN: La importancia de las redes en-chip en los procesadores multi-núcleo es cada vez mayor. Los routers con baipás son una solución eficiente para reducir la latencia de estas redes. Existen dos tipos de redes con baipás: single-hop y multi-hop. Las redes con baipás single-hop minimizan la latencia individual de cada router al asignar los recursos del router con antelación a la recepción de los paquetes. Las redes con baipás multi-hop, conocidas como SMART, permiten que los paquetes atraviesen múltiples routers en un único ciclo. La primera propuesta de esta tesis es Non-Empty Buffer Bypass (NEBB), un mecanismo que incrementa la utilización del baipás de tipo single-hop, eliminando la necesidad de usar canales virtuales. Para redes con baipás multi-hop propone SMART++ y S-SMART++. SMART++ elimina la necesidad de SMART de usar una gran cantidad de canales virtuales para aprovechar el ancho de banda de la red, permitiendo el diseño de configuraciones de bajo coste. S-SMART++ hace uso de la asignación de recursos de forma especulativa para preparar el baipás de tipo multi-hop. Este mecanismo reduce la latencia y su dependencia con la longitud máxima de los saltos de tipo multi-hop, aspecto clave para su viabilidad en diseños reales. La contribución final es un conjunto de herramientas de código abierto llamada Bypass Simulation Toolset (BST) compuesto por versiones extendidas de BookSim y OpenSMART, una API para integrar BookSim en otros simuladores y una serie de scripts para facilitar el diseño y evaluación de este tipo de redes.ABSTRACT: Networks on-Chip (NoCs) are becoming more important in many-core processors as the number of cores grows. Bypass routers are an efficient solution that skips pipeline stages. There are two types of bypass mechanisms: single-hop and multi-hop bypass. Single-hop bypass minimizes the router delay by skipping allocation stages in each hop. Multi-hop bypass, called SMART, minimizes the effective number of hops by traversing multiple routers in a single cycle. The first proposal of this dissertation is Non-Empty Buffer Bypass (NEBB) for single-hop bypass, which increases the bypass utilization without requiring VCs to match traditional bypass routers. It proposes SMART++ and S-SMART++ for multi-hop bypass. SMART++ removes the requirement of using multiple VCs of SMART to exploit the bandwidth of the network, enabling low-cost configurations. S-SMART++ relies on speculative allocation to set up multi-hop bypass paths. Thus, it reduces latency and its dependency with the maximum length of multi-hops, relaxing the requirements to integrate multi-hop bypass in real designs. The final contribution is an open-source set of tools to simulate bypass NoCs called Bypass Simulation Toolset (BST) conformed by extended versions of BookSim and OpenSMART, an API to integrate BookSim in other simulators, and scripts to simplify the designing and evaluation of such NoCs.This work was supported by the Spanish Ministry of Science, Innovation and Universities, FPI grant BES-2017-079971, and contracts TIN2010-21291-C02-02, TIN2013- 46957-C2-2-P, TIN2015-65316-P, TIN2016-76635-C2-2-R (AEI/FEDER, UE) and TIC PID2019-105660RB-C22; the European HiPEAC Network of Excellence; the European Community's Seventh Framework Programme (FP7/2007-2013), under the Mont-Blanc 1 and 2 projects (grant agreements n 288777 and 610402); the European Union's Horizon 2020 research and innovation programme under the Mont-Blanc 3 project (grant agreement nº 671697). Bluespec Inc. provided access to Bluespec tools

    The effect of an optical network on-chip on the performance of chip multiprocessors

    Get PDF
    Optical networks on-chip (ONoC) have been proposed to reduce power consumption and increase bandwidth density in high performance chip multiprocessors (CMP), compared to electrical NoCs. However, as buffering in an ONoC is not viable, the end-to-end message path needs to be acquired in advance during which the message is buffered at the network ingress. This waiting latency is therefore a combination of path setup latency and contention and forms a significant part of the total message latency. Many proposed ONoCs, such as Single Writer, Multiple Reader (SWMR), avoid path setup latency at the expense of increased optical components. In contrast, this thesis investigates a simple circuit-switched ONoC with lower component count where nodes need to request a channel before transmission. To hide the path setup latency, a coherence-based message predictor is proposed, to setup circuits before message arrival. Firstly, the effect of latency and bandwidth on application performance is thoroughly investigated using full-system simulations of shared memory CMPs. It is shown that the latency of an ideal NoC affects the CMP performance more than the NoC bandwidth. Increasing the number of wavelengths per channel decreases the serialisation latency and improves the performance of both ONoC types. With 2 or more wavelengths modulating at 25 Gbit=s , the ONoCs will outperform a conventional electrical mesh (maximal speedup of 20%). The SWMR ONoC outperforms the circuit-switched ONoC. Next coherence-based prediction techniques are proposed to reduce the waiting latency. The ideal coherence-based predictor reduces the waiting latency by 42%. A more streamlined predictor (smaller than a L1 cache) reduces the waiting latency by 31%. Without prediction, the message latency in the circuit-switched ONoC is 11% larger than in the SWMR ONoC. Applying the realistic predictor reverses this: the message latency in the SWMR ONoC is now 18% larger than the predictive circuitswitched ONoC

    Head-of-Line Blocking Reduction in Power-Efficient Networks-on-Chip

    Full text link
    Tesis por compendioNowadays, thanks to the continuous improvements in the integration scale, more and more cores are added on the same chip, leading to higher system performance. In order to interconnect all nodes, a network-on-chip (NoC) is used, which is in charge of delivering data between cores. However, increasing the number of cores leads to a significant power consumption increase, leading the NoC to be one of the most expensive components in terms of power. Because of this, during the last years, several mechanisms have been proposed to address the NoC power consumption by means of DVFS (Dynamic Voltage and Frequency Scaling) and power-gating strategies. Nevertheless, improvements achieved by these mechanisms are achieved, to a greater or lesser extent, at the cost of system performance, potentially increasing the risk of saturating the network by forming congested points which, in turn, compromise the rest of the system functionality. One side effect is the creation of the "Head-of-Line blocking" effect where congested packets at the head of queues prevent other non-blocked packets from advancing. To address this issue, in this thesis, on one hand, we propose novel congestion control techniques in order to improve system performance by removing the "Head-of-Line" blocking effect. On the other hand, we propose combined solutions adapted to DVFS in order to achieve improvements in terms of performance and power. In addition to this, we propose a path-aware power-gating-based mechanism, which is capable of detecting the flows sharing buffer resources along data paths and perform to switch them off when not needed. With all these combined solutions we can significantly reduce the power consumption of the NoC when compared with state-of-the-art proposals.Hoy en día, gracias a las mejoras en la escala de integración cada vez se integran más y más núcleos en un mismo chip, mejorando así sus prestaciones. Para interconectar todos los nodos dentro del chip se emplea una red en chip (NoC, Network-on-Chip), la cual es la encargada de intercambiar información entre núcleos. No obstante, aumentar el número de núcleos en el chip también conlleva a su vez un importante incremento en el consumo de la NoC, haciendo que ésta se convierta en una de las partes más caras del chip en términos de consumo. Por ello, en los últimos años se han propuesto diversas técnicas de ahorro de energía orientadas a reducir el consumo de la NoC mediante el uso de DVFS (Dynamic Voltage and Frequency Scaling) o estrategias basadas en "power-gating". Sin embargo, éstas mejoras de consumo normalmente se obtienen a costa de sacrificar, en mayor o menor medida, las prestaciones del sistema, aumentado potencialmente así el riesgo de saturar la red, generando puntos de congestión que, a su vez, comprometen el rendimiento del resto del sistema. Un efecto colateral es el "Head-of-Line blocking", mediante el que paquetes congestionados en la cabeza de la cola impiden que otros paquetes no congestionados avancen. Con el fin de solucionar este problema, en ésta tesis, en primer lugar, proponemos técnicas novedosas de control de congestión para incrementar el rendimiento del sistema mediante la eliminación del "Head-of-Line blocking", mientras que, por otra parte, proponemos soluciones combinadas adaptadas a DVFS con el fin de conseguir mejoras en términos de rendimiento y energía. Además, proponemos una técnica de "power-gating" orientada a rutas de datos, la cual es capaz de detectar flujos de datos compartiendo recursos a lo largo de rutas y apagar dichos recursos de forma dinámica cuando no son necesarios. Con todas éstas soluciones combinadas podemos reducir el consumo de energía de la NoC en comparación con otras técnicas presentes en el estado del arte.Hui en dia, gr\`acies a les millores en l'escala d'integraci\'o, cada vegada s'integren m\'es i m\'es nuclis en un mateix xip, la qual cosa millora les seues prestacions. Per tal d'interconectar tots els nodes dins el xip es fa \'us d'una Xarxa en Xip (NoC; Network-on-Chip), la qual \'es l'encarregada d'intercanviar informaci\'o entre els nuclis. No obstant aix\`o, incrementar el nombre de nuclis en el xip tamb\'e comporta un important augment en el consum de la NoC, la qual cosa fa que aquesta es convertisca en una de les parts m\'es costoses del xip en termes de consum. Per aix\`o, en els \'ultims anys s'han proposat diverses t\`ecniques d'estalvi d'energia orientades a reduir el consum de la NoC mitjançant l'\'us de DVFS (Dynamic Voltage and Frequency Scaling) o estrat\`egies basades en ``power-gating''. Malgrat aix\`o, aquestes millores en les prestacions normalment s'obtenen a costa de sacrificar, en major o menor mesura, les prestacions del sistema i augmenta aix\'i el risc de saturar la xarxa al generar-se punts de congesti\'o, que al mateix temps, comprometen el rendiment de la resta del sistema. Un efecte col-lateral \'es el ``Head-of- Line blocking'', mitjançant el qual, els paquets congestionats al cap de la cua, impedixen que altres paquets no congestionats avancen. A fi de solucionar eixe problema, en aquesta tesi, en primer lloc, proposem noves t\`ecniques de control de congesti\'o amb l'objectiu d'incrementar el rendiment del sistema per mitj\`a de l'eliminaci\'o del ``Head-of- Line blocking'', i d'altra banda, proposem solucions combinades adaptades a DVFS amb la finalitat d'aconseguir millores en termes de rendiment i energia. A m\'es, proposem una t\`ecnica de ``power-gating'' orientada a rutes de dades, la qual \'es capa\c c de detectar fluxos de dades al compartir recursos al llarg de les rutes i apagar eixos recursos de forma din\`amica quan no s\'on necessaris. Amb totes aquestes solucions combinades podem reduir el consum d'energia de la NoC en comparaci\'o amb altres t\`ecniques presents en l'estat de l'art.Escamilla López, JV. (2017). Head-of-Line Blocking Reduction in Power-Efficient Networks-on-Chip [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90419TESISCompendi

    MOCAST 2021

    Get PDF
    The 10th International Conference on Modern Circuit and System Technologies on Electronics and Communications (MOCAST 2021) will take place in Thessaloniki, Greece, from July 5th to July 7th, 2021. The MOCAST technical program includes all aspects of circuit and system technologies, from modeling to design, verification, implementation, and application. This Special Issue presents extended versions of top-ranking papers in the conference. The topics of MOCAST include:Analog/RF and mixed signal circuits;Digital circuits and systems design;Nonlinear circuits and systems;Device and circuit modeling;High-performance embedded systems;Systems and applications;Sensors and systems;Machine learning and AI applications;Communication; Network systems;Power management;Imagers, MEMS, medical, and displays;Radiation front ends (nuclear and space application);Education in circuits, systems, and communications

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    High Performance On-Chip Interconnects Design for Future Many-Core Architectures

    Get PDF
    Switch-based Network-on-Chip (NoC) is a widely accepted inter-core communication infrastructure for Chip Multiprocessors (CMPs). With the continued advance of CMOS technology, the number of cores on a single chip keeps increasing at a rapid pace. It is highly expected that many-core architectures with more than hundreds of processor cores will be commercialized in the near future. In such a large scale CMP system, NoC overheads are more dominant than computation power in determining overall system performance. Also, for modern computational workloads requiring abundant thread level parallelism (TLP), NoC design for highly-parallel, many-core accelerators such as General Purpose Graphics Processing Units (GPGPUs) is of primary importance in harnessing the potential of massive thread- and data-level parallelism. In these contexts, it is critical that NoC provides both low latency and high bandwidth within limited on-chip power and area budgets. In this dissertation, we explore various design issues inherent in future many-core architectures, CMPs and GPGPUs, to achieve both high performance and power efficiency. First, we deal with issues in using a promising next generation memory technology, Spin-Transfer Torque Magnetic RAM (STT-MRAM), for NoC input buffers in CMPs. Using a high density and low leakage memory offers more buffer capacities with the same die footprint, thus helping increase network throughput in NoC routers. However, its long latency and high power consumption in write operations still need to be addressed. Thus, we propose a hybrid design of input buffers using both SRAM and STT-MRAM to hide the long write latency efficiently. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, we provide a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer. Second, we propose the first NoC router design that uses only STT-MRAM, providing much larger buffer space with less power consumption, while preserving data integrity. To hide the multicycle writes, we employ a multibank STT-MRAM buffer, a virtual channel with multiple banks where every incoming flit is seamlessly pipelined to each bank alternately. Our STT-MRAM design has aggressively reduced the retention time, resulting in a significant reduction in the latency and power overheads of write operations. To ensure data integrity against inadvertent bit flips from the thermal fluctuation during the given retention time, we propose a cost-efficient dynamic buffer refresh scheme combined with Error Correcting Codes (ECC) to detect and correct data corruption. Third, we present schemes for bandwidth-efficient on-chip interconnects in GPGPUs. GPGPUs place a heavy demand on the on-chip interconnect between the many cores and a few memory controllers (MCs). Thus, traffic is highly asymmetric, impacting on-chip resource utilization and system performance. Here, we analyze the communication demands of typical GPGPU applications, and propose efficient NoC designs to meet those demands
    corecore