92 research outputs found

    High-Performance and Wavelength-Reused Optical Network on Chip (ONoC) Architectures and Communication Schemes for Manycore Processor

    Get PDF
    Optical Network on Chip (ONoC) is an emerging chip-scale optical interconnection technology to realize the high-performance and power-efficient inter-core communication for many-core processors. By utilizing the silicon photonic interconnects to transmit data packets with optical signals, it can achieve ultra low communication delay, high bandwidth capacity, and low power dissipation. With the benefits of Wavelength Division Multiplexing (WDM), multiple optical signals can simultaneously be transmitted in the same optical interconnect through different wavelengths. Thus, the WDM-based ONoC is becoming a hot research topic recently. However, the maximal number of available wavelengths is restricted for the reliable and power-efficient optical communication in ONoC. Hence, with a limited number of wavelengths, the design of high-performance and power-efficient ONoC architecture is an important and challenging problem. In this thesis, the design methodology of wavelength-reused ONoC architecture is explored. With the wavelength reuse scheme in optical routing paths, high-performance and power-efficient communication is realized for many-core processors only using a small number of available wavelengths. Three wavelength-reused ONoC architectures and communication schemes are proposed to fulfil different communication requirements, i.e., network scalability, multicast communication, and dark silicon. Firstly, WRH-ONoC, a wavelength-reused hierarchical Optical Network on Chip architecture, is proposed to achieve high network scalability, namely obtaining low communication delay and high throughput capacity for hundreds of thousands of cores by reusing the limited number of available wavelengths with the modest hardware cost and energy overhead. WRH-ONoC combines the advantages of non-blocking communication in each lambda-router and wavelength reuse in all lambda-routers through the hierarchical networking. Both theoretical analysis and simulation results indicate that WRH-ONoC can achieve prominent improvement on the communication performance and scalability (e.g., 46.0% of reduction on the zero-load packet delay and 72.7% of improvement on the network throughput for 400 cores with small hardware cost and energy overhead) in comparison with existing schemes. Secondly, DWRMR, a dynamical wavelength-reused multicast scheme based on the optical multicast ring, is proposed for widely existing multicast communications in many-core processors. In DWRMR, an optical multicast ring is dynamically constructed for each multicast group and the multicast packets are transmitted in a single-send-multi-receive manner requiring only one wavelength. All the cores in the same multicast group can reuse the established multicast ring through an optical token arbitration scheme for the interactive multicast communications, thereby avoiding the frequent construction of multicast routing paths dedicatedly for each core. Simulation results indicate that DWRMR can reduce more than 50% of end-to-end packet delay with slight hardware cost, or require only half number of wavelengths to achieve the same performance compared with existing schemes. Thirdly, Dark-ONoC, a dynamically configurable ONoC architecture, is proposed for the many-core processor with dark silicon. Dark silicon is an inevitable phenomenon that only a small number of cores can be activated simultaneously while the other cores must stay in dark state (power-gated) due to the restricted power budget. Dark-ONoC periodically allocates non-blocking optical routing paths only between the active cores with as less wavelengths as possible. Thus, it can obtain high-performance communication and low power consumption at the same time. Extensive simulations are conducted with the dark silicon patterns from both synthetic distribution and real data traces. The simulation results indicate that the number of wavelengths is reduced by around 15% and the overall power consumption is reduced by 23.4% compared to existing schemes. Finally, this thesis concludes several important principles on the design of wavelength-reused ONoC architecture, and summarizes some perspective issues for the future research

    Efficient Interconnection Network Design for Heterogeneous Architectures

    Get PDF
    The onset of big data and deep learning applications, mixed with conventional general-purpose programs, have driven computer architecture to embrace heterogeneity with specialization. With the ever-increasing interconnected chip components, future architectures are required to operate under a stricter power budget and process emerging big data applications efficiently. Interconnection network as the communication backbone thus is facing the grand challenges of limited power envelope, data movement and performance scaling. This dissertation provides interconnect solutions that are specialized to application requirements towards power-/energy-efficient and high-performance computing for heterogeneous architectures. This dissertation examines the challenges of network-on-chip router power-gating techniques for general-purpose workloads to save static power. A voting approach is proposed as an adaptive power-gating policy that considers both local and global traffic status through router voting. In addition, low-latency routing algorithms are designed to guarantee performance in irregular power-gating networks. This holistic solution not only saves power but also avoids performance overhead. This research also introduces emerging computation paradigms to interconnects for big data applications to mitigate the pressure of data movement. Approximate network-on-chip is proposed to achieve high-throughput communication by means of lossy compression. Then, near-data processing is combined with in-network computing to further improve performance while reducing data movement. The two schemes are general to play as plug-ins for different network topologies and routing algorithms. To tackle the challenging computational requirements of deep learning workloads, this dissertation investigates the compelling opportunities of communication algorithm-architecture co-design to accelerate distributed deep learning. MultiTree allreduce algorithm is proposed to bond with message scheduling with network topology to achieve faster and contention-free communication. In addition, the interconnect hardware and flow control are also specialized to exploit deep learning communication characteristics and fulfill the algorithm needs, thereby effectively improving the performance and scalability. By considering application and algorithm characteristics, this research shows that interconnection network can be tailored accordingly to improve the power-/energy-efficiency and performance to satisfy heterogeneous computation and communication requirements

    Best Effort Minimal Routing for Fly-Over: A Light Weight Distributed Mechanism for Energy Efficient Network-On-Chip

    Get PDF
    Scalable Networks-on-Chip (NoCs) have become the de facto interconnection mechanism in large scale Chip Multiprocessors. NoCs devour a large fraction of the on-chip power budget of which static NoC power consumption is becoming the dominant component as technology scales down. Hence, reducing static NoC power consumption is critical for energy-efficient computing. Previous research suggests power-gating routers attached to inactive cores so as to save static power, but requires centralized control and global network knowledge. Moreover, packet deliveries in irregular power-gated network suffer from detour or waiting time overhead to either route around or wake up off routers. Fly-Over (FLOV) is a distributed power-gating mechanism to minimize static power consumption in NoCs without the need for global network information. However, the existing FLOV routing algorithm introduces unnecessary detours and pressurizes the routers in AON column resulting in high packet latencies and network congestion. This work proposes FLOV+, Best-Effort Minimal Routing Algorithm for Fly-Over (FLOV) to route the packets using the shortest path in an irregular power-gated network and also relieve the stress on the AON column. This routing algorithm aims to minimize the average packet latency and to sustain throughput in network with power-gated routers. Synthetic workload evaluations show that the proposed algorithm reduces average packet latency upto 9.84% in an 8-dimensional mesh network. Simulation results also show 50% and 40% improvement in the network throughput for restricted FLOV and generalized FLOV power gating mechanisms respectively

    Application Centric Networks-On-Chip Design Solutions for Future Multicore Systems

    Get PDF
    With advances in technology, future multicore systems scaled to 100s and 1000s of cores/accelerators are being touted as an effective solution for extracting huge performance gains using parallel programming paradigms. However with the failure of Dennard Scaling all the components on the chip cannot be run simultaneously without breaking the power and thermal constraints leading to strict chip power envelops. The scaling up of the number of on chip components has also brought upon Networks-On-Chip (NoC) based interconnect designs like 2D mesh. The contribution of NoC to the total on chip power and overall performance has been increasing steadily and hence high performance power-efficient NoC designs are becoming crucial. Future multicore paradigms can be broadly classified, based on the applications they are tailored to, into traditional Chip Multi processor(CMP) based application based systems, characterized by low core and NoC utilization, and emerging big data application based systems, characterized by large amounts of data movement necessitating high throughput requirements. To this order, we propose NoC design solutions for power-savings in future CMPs tailored to traditional applications and higher effective throughput gains in multicore systems tailored to bandwidth intensive applications. First, we propose Fly-over, a light-weight distributed mechanism for power-gating routers attached to switched off cores to reduce NoC power consumption in low load CMP environment. Secondly, we plan on utilizing a promising next generation memory technology, Spin-Transfer Torque Magnetic RAM(STT-MRAM), to achieve enhanced NoC performance to satisfy the high throughput demands in emerging bandwidth intensive applications, while reducing the power consumption simultaneously. Thirdly, we present a hardware data approximation framework for NoCs, APPROX-NoC, with an online data error control mechanism, which can leverage the approximate computing paradigm in the emerging data intensive big data applications to attain higher performance per watt

    An Efficient NoC-based Framework To Improve Dataflow Thread Management At Runtime

    Get PDF
    This doctoral thesis focuses on how the application threads that are based on dataflow execution model can be managed at Network-on-Chip (NoC) level. The roots of the dataflow execution model date back to the early 1970’s. Applications adhering to such program execution model follow a simple producer-consumer communication scheme for synchronising parallel thread related activities. In dataflow execution environment, a thread can run if and only if all its required inputs are available. Applications running on a large and complex computing environment can significantly benefit from the adoption of dataflow model. In the first part of the thesis, the work is focused on the thread distribution mechanism. It has been shown that how a scalable hash-based thread distribution mechanism can be implemented at the router level with low overheads. To enhance the support further, a tool to monitor the dataflow threads’ status and a simple, functional model is also incorporated into the design. Next, a software defined NoC has been proposed to manage the distribution of dataflow threads by exploiting its reconfigurability. The second part of this work is focused more on NoC microarchitecture level. Traditional 2D-mesh topology is combined with a standard ring, to understand how such hybrid network topology can outperform the traditional topology (such as 2D-mesh). Finally, a mixed-integer linear programming based analytical model has been proposed to verify if the application threads mapped on to the free cores is optimal or not. The proposed mathematical model can be used as a yardstick to verify the solution quality of the newly developed mapping policy. It is not trivial to provide a complete low-level framework for dataflow thread execution for better resource and power management. However, this work could be considered as a primary framework to which improvements could be carried out

    Head-of-Line Blocking Reduction in Power-Efficient Networks-on-Chip

    Full text link
    Tesis por compendioNowadays, thanks to the continuous improvements in the integration scale, more and more cores are added on the same chip, leading to higher system performance. In order to interconnect all nodes, a network-on-chip (NoC) is used, which is in charge of delivering data between cores. However, increasing the number of cores leads to a significant power consumption increase, leading the NoC to be one of the most expensive components in terms of power. Because of this, during the last years, several mechanisms have been proposed to address the NoC power consumption by means of DVFS (Dynamic Voltage and Frequency Scaling) and power-gating strategies. Nevertheless, improvements achieved by these mechanisms are achieved, to a greater or lesser extent, at the cost of system performance, potentially increasing the risk of saturating the network by forming congested points which, in turn, compromise the rest of the system functionality. One side effect is the creation of the "Head-of-Line blocking" effect where congested packets at the head of queues prevent other non-blocked packets from advancing. To address this issue, in this thesis, on one hand, we propose novel congestion control techniques in order to improve system performance by removing the "Head-of-Line" blocking effect. On the other hand, we propose combined solutions adapted to DVFS in order to achieve improvements in terms of performance and power. In addition to this, we propose a path-aware power-gating-based mechanism, which is capable of detecting the flows sharing buffer resources along data paths and perform to switch them off when not needed. With all these combined solutions we can significantly reduce the power consumption of the NoC when compared with state-of-the-art proposals.Hoy en día, gracias a las mejoras en la escala de integración cada vez se integran más y más núcleos en un mismo chip, mejorando así sus prestaciones. Para interconectar todos los nodos dentro del chip se emplea una red en chip (NoC, Network-on-Chip), la cual es la encargada de intercambiar información entre núcleos. No obstante, aumentar el número de núcleos en el chip también conlleva a su vez un importante incremento en el consumo de la NoC, haciendo que ésta se convierta en una de las partes más caras del chip en términos de consumo. Por ello, en los últimos años se han propuesto diversas técnicas de ahorro de energía orientadas a reducir el consumo de la NoC mediante el uso de DVFS (Dynamic Voltage and Frequency Scaling) o estrategias basadas en "power-gating". Sin embargo, éstas mejoras de consumo normalmente se obtienen a costa de sacrificar, en mayor o menor medida, las prestaciones del sistema, aumentado potencialmente así el riesgo de saturar la red, generando puntos de congestión que, a su vez, comprometen el rendimiento del resto del sistema. Un efecto colateral es el "Head-of-Line blocking", mediante el que paquetes congestionados en la cabeza de la cola impiden que otros paquetes no congestionados avancen. Con el fin de solucionar este problema, en ésta tesis, en primer lugar, proponemos técnicas novedosas de control de congestión para incrementar el rendimiento del sistema mediante la eliminación del "Head-of-Line blocking", mientras que, por otra parte, proponemos soluciones combinadas adaptadas a DVFS con el fin de conseguir mejoras en términos de rendimiento y energía. Además, proponemos una técnica de "power-gating" orientada a rutas de datos, la cual es capaz de detectar flujos de datos compartiendo recursos a lo largo de rutas y apagar dichos recursos de forma dinámica cuando no son necesarios. Con todas éstas soluciones combinadas podemos reducir el consumo de energía de la NoC en comparación con otras técnicas presentes en el estado del arte.Hui en dia, gr\`acies a les millores en l'escala d'integraci\'o, cada vegada s'integren m\'es i m\'es nuclis en un mateix xip, la qual cosa millora les seues prestacions. Per tal d'interconectar tots els nodes dins el xip es fa \'us d'una Xarxa en Xip (NoC; Network-on-Chip), la qual \'es l'encarregada d'intercanviar informaci\'o entre els nuclis. No obstant aix\`o, incrementar el nombre de nuclis en el xip tamb\'e comporta un important augment en el consum de la NoC, la qual cosa fa que aquesta es convertisca en una de les parts m\'es costoses del xip en termes de consum. Per aix\`o, en els \'ultims anys s'han proposat diverses t\`ecniques d'estalvi d'energia orientades a reduir el consum de la NoC mitjançant l'\'us de DVFS (Dynamic Voltage and Frequency Scaling) o estrat\`egies basades en ``power-gating''. Malgrat aix\`o, aquestes millores en les prestacions normalment s'obtenen a costa de sacrificar, en major o menor mesura, les prestacions del sistema i augmenta aix\'i el risc de saturar la xarxa al generar-se punts de congesti\'o, que al mateix temps, comprometen el rendiment de la resta del sistema. Un efecte col-lateral \'es el ``Head-of- Line blocking'', mitjançant el qual, els paquets congestionats al cap de la cua, impedixen que altres paquets no congestionats avancen. A fi de solucionar eixe problema, en aquesta tesi, en primer lloc, proposem noves t\`ecniques de control de congesti\'o amb l'objectiu d'incrementar el rendiment del sistema per mitj\`a de l'eliminaci\'o del ``Head-of- Line blocking'', i d'altra banda, proposem solucions combinades adaptades a DVFS amb la finalitat d'aconseguir millores en termes de rendiment i energia. A m\'es, proposem una t\`ecnica de ``power-gating'' orientada a rutes de dades, la qual \'es capa\c c de detectar fluxos de dades al compartir recursos al llarg de les rutes i apagar eixos recursos de forma din\`amica quan no s\'on necessaris. Amb totes aquestes solucions combinades podem reduir el consum d'energia de la NoC en comparaci\'o amb altres t\`ecniques presents en l'estat de l'art.Escamilla López, JV. (2017). Head-of-Line Blocking Reduction in Power-Efficient Networks-on-Chip [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90419TESISCompendi

    Hardware/Software Co-design for Multicore Architectures

    Get PDF
    Siirretty Doriast

    Zuverlässige und Energieeffiziente gemischt-kritische Echtzeit On-Chip Systeme

    Get PDF
    Multi- and many-core embedded systems are increasingly becoming the target for many applications that require high performance under varying conditions. A resulting challenge is the control, and reliable operation of such complex multiprocessing architectures under changes, e.g., high temperature and degradation. In mixed-criticality systems where many applications with varying criticalities are consolidated on the same execution platform, fundamental isolation requirements to guarantee non-interference of critical functions are crucially important. While Networks-on-Chip (NoCs) are the prevalent solution to provide scalable and efficient interconnects for the multiprocessing architectures, their associated energy consumption has immensely increased. Specifically, hard real-time NoCs must manifest limited energy consumption as thermal runaway in such a core shared resource jeopardizes the whole system guarantees. Thus, dynamic energy management of NoCs, as opposed to the related work static solutions, is highly necessary to save energy and decrease temperature, while preserving essential temporal requirements. In this thesis, we introduce a centralized management to provide energy-aware NoCs for hard real-time systems. The design relies on an energy control network, developed on top of an existing switch arbitration network to allow isolation between energy optimization and data transmission. The energy control layer includes local units called Power-Aware NoC controllers that dynamically optimize NoC energy depending on the global state and applications’ temporal requirements. Furthermore, to adapt to abnormal situations that might occur in the system due to degradation, we extend the concept of NoC energy control to include the entire system scope. That is, online resource management employing hierarchical control layers to treat system degradation (imminent core failures) is supported. The mechanism applies system reconfiguration that involves workload migration. For mixed-criticality systems, it allows flexible boundaries between safety-critical and non-critical subsystems to safely apply the reconfiguration, preserving fundamental safety requirements and temporal predictability. Simulation and formal analysis-based experiments on various realistic usecases and benchmarks are conducted showing significant improvements in NoC energy-savings and in treatment of system degradation for mixed-criticality systems improving dependability over the status quo.Eingebettete Many- und Multi-core-Systeme werden zunehmend das Ziel für Anwendungen, die hohe Anfordungen unter unterschiedlichen Bedinungen haben. Für solche hochkomplexed Multi-Prozessor-Systeme ist es eine grosse Herausforderung zuverlässigen Betrieb sicherzustellen, insbesondere wenn sich die Umgebungseinflüsse verändern. In Systeme mit gemischter Kritikalität, in denen viele Anwendungen mit unterschiedlicher Kritikalität auf derselben Ausführungsplattform bedient werden müssen, sind grundlegende Isolationsanforderungen zur Gewährleistung der Nichteinmischung kritischer Funktionen von entscheidender Bedeutung. Während On-Chip Netzwerke (NoCs) häufig als skalierbare Verbindung für die Multiprozessor-Architekturen eingesetzt werden, ist der damit verbundene Energieverbrauch immens gestiegen. Daher sind dynamische Plattformverwaltungen, im Gegensatz zu den statischen, zwingend notwendig, um ein System an die oben genannten Veränderungen anzupassen und gleichzeitig Timing zu gewährleisten. In dieser Arbeit entwickeln wir energieeffiziente NoCs für harte Echtzeitsysteme. Das Design basiert auf einem Energiekontrollnetzwerk, das auf einem bestehenden Switch-Arbitration-Netzwerk entwickelt wurde, um eine Isolierung zwischen Energieoptimierung und Datenübertragung zu ermöglichen. Die Energiesteuerungsschicht umfasst lokale Einheiten, die als Power-Aware NoC-Controllers bezeichnet werden und die die NoC-Energie in Abhängigkeit vom globalen Zustand und den zeitlichen Anforderungen der Anwendungen optimieren. Darüber hinaus wird das Konzept der NoC-Energiekontrolle zur Anpassung an Anomalien, die aufgrund von Abnutzung auftreten können, auf den gesamten Systemumfang ausgedehnt. Online- Ressourcenverwaltungen, die hierarchische Kontrollschichten zur Behandlung Abnutzung (drohender Kernausfälle) einsetzen, werden bereitgestellt. Bei Systemen mit gemischter Kritikalität erlaubt es flexible Grenzen zwischen sicherheitskritischen und unkritischen Subsystemen, um die Rekonfiguration sicher anzuwenden, wobei grundlegende Sicherheitsanforderungen erhalten bleiben und Timing Vorhersehbarkeit. Experimente werden auf der Basis von Simulationen und formalen Analysen zu verschiedenen realistischen Anwendungsfallen und Benchmarks durchgeführt, die signifikanten Verbesserungen bei On-Chip Netzwerke-Energieeinsparungen und bei der Behandlung von Abnutzung für Systeme mit gemischter Kritikalität zur Verbesserung die Systemstabilität gegenüber dem bisherigen Status quo zeigen
    corecore