107 research outputs found

    Experimental Evaluation and Comparison of Time-Multiplexed Multi-FPGA Routing Architectures

    Get PDF
    Emulating large complex designs require multi-FPGA systems (MFS). However, inter-FPGA communication is confronted by the challenge of lack of interconnect capacity due to limited number of FPGA input/output (I/O) pins. Serializing parallel signals onto a single trace effectively addresses the limited I/O pin obstacle. Besides the multiplexing scheme and multiplexing ratio (number of inter-FPGA signals per trace), the choice of the MFS routing architecture also affect the critical path latency. The routing architecture of an MFS is the interconnection pattern of FPGAs, fixed wires and/or programmable interconnect chips. Performance of existing MFS routing architectures is also limited by off-chip interface selection. In this dissertation we proposed novel 2D and 3D latency-optimized time-multiplexed MFS routing architectures. We used rigorous experimental approach and real sequential benchmark circuits to evaluate and compare the proposed and existing MFS routing architectures. This research provides a new insight into the encouraging effects of using off-chip optical interface and three dimensional MFS routing architectures. The vertical stacking results in shorter off-chip links improving the overall system frequency with the additional advantage of smaller footprint area. The proposed 3D architectures employed serialized interconnect between intra-plane and inter-plane FPGAs to address the pin limitation problem. Additionally, all off-chip links are replaced by optical fibers that exhibited latency improvement and resulted in faster MFS. Results indicated that exploiting third dimension provided latency and area improvements as compared to 2D MFS. We also proposed latency-optimized planar 2D MFS architectures in which electrical interconnections are replaced by optical interface in same spatial distribution. Performance evaluation and comparison showed that the proposed architectures have reduced critical path delay and system frequency improvement as compared to conventional MFS. We also experimentally evaluated and compared the system performance of three inter-FPGA communication schemes i.e. Logic Multiplexing, SERDES and MGT in conjunction with two routing architectures i.e. Completely Connected Graph (CCG) and TORUS. Experimental results showed that SERDES attained maximum frequency than the other two schemes. However, for very high multiplexing ratios, the performance of SERDES & MGT became comparable

    Automated Hardware Prototyping for 3D Network on Chips

    Get PDF
    Vor mehr als 50 Jahren stellte Intel® Mitbegründer Gordon Moore eine Prognose zum Entwicklungsprozess der Transistortechnologie auf. Er prognostizierte, dass sich die Zahl der Transistoren in integrierten Schaltungen alle zwei Jahre verdoppeln wird. Seine Aussage ist immer noch gültig, aber ein Ende von Moores Gesetz ist in Sicht. Mit dem Ende von Moore’s Gesetz müssen neue Aspekte untersucht werden, um weiterhin die Leistung von integrierten Schaltungen zu steigern. Zwei mögliche Ansätze für "More than Moore” sind 3D-Integrationsverfahren und heterogene Systeme. Gleichzeitig entwickelt sich ein Trend hin zu Multi-Core Prozessoren, basierend auf Networks on chips (NoCs). Neben dem Ende des Mooreschen Gesetzes ergeben sich bei immer kleiner werdenden Technologiegrößen, vor allem jenseits der 60 nm, neue Herausforderungen. Eine Schwierigkeit ist die Wärmeableitung in großskalierten integrierten Schaltkreisen und die daraus resultierende Überhitzung des Chips. Um diesem Problem in modernen Multi-Core Architekturen zu begegnen, muss auch die Verlustleistung der Netzwerkressourcen stark reduziert werden. Diese Arbeit umfasst eine durch Hardware gesteuerte Kombination aus Frequenzskalierung und Power Gating für 3D On-Chip Netzwerke, einschließlich eines FPGA Prototypen. Dafür wurde ein Takt-synchrones 2D Netzwerk auf ein dreidimensionales asynchrones Netzwerk mit mehreren Frequenzbereichen erweitert. Zusätzlich wurde ein skalierbares Online-Power-Management System mit geringem Ressourcenaufwand entwickelt. Die Verifikation neuer Hardwarekomponenten ist einer der zeitaufwendigsten Schritte im Entwicklungsprozess hochintegrierter digitaler Schaltkreise. Um diese Aufgabe zu beschleunigen und um eine parallele Softwareentwicklung zu ermöglichen, wurde im Rahmen dieser Arbeit ein automatisiertes und benutzerfreundliches Tool für den Entwurf neuer Hardware Projekte entwickelt. Eine grafische Benutzeroberfläche zum Erstellen des gesamten Designablaufs, vom Erstellen der Architektur, Parameter Deklaration, Simulation, Synthese und Test ist Teil dieses Werkzeugs. Zudem stellt die Größe der Architektur für die Erstellung eines Prototypen eine besondere Herausforderung dar. Frühere Arbeiten haben es versäumt, eine schnelles und unkompliziertes Prototyping, insbesondere von Architekturen mit mehr als 50 Prozessorkernen, zu realisieren. Diese Arbeit umfasst eine Design Space Exploration und FPGA-basierte Prototypen von verschiedenen 3D-NoC Implementierungen mit mehr als 80 Prozessoren

    Experimental Evaluation of an NoC Synthesis Tool

    Get PDF
    Rapid growth in the number of IP cores in SoCs resulted in the need for effective and scalable interconnect scheme for system components - Network-on-Chip (NoC). Design and implementation of an NoC from scratch is very time consuming and limits the NoC design space that can be explored. In this thesis we evaluate and compare NoC synthesis tool CONNECT with manually generated NoC design using Altera Quartus II. Three sizes of ring, mesh and torus NoC topologies are used for evaluation based on two metrics: logic resource utilization and maximum clock frequency. For larger NoC sizes manual design provides up to 85% reduction in area utilization. With respect to maximum clock frequency, CONNECT provides superior results for all NoC sizes, providing up to 80% higher clock frequency. These results provide an insight into the area versus frequency tradeoffs when using the CONNECT NoC synthesis tool

    Heracles: A Tool for Fast RTL-Based Design Space Exploration of Multicore Processors

    Get PDF
    This paper presents Heracles, an open-source, functional, parameterized, synthesizable multicore system toolkit. Such a multi/many-core design platform is a powerful and versatile research and teaching tool for architectural exploration and hardware-software co-design. The Heracles toolkit comprises the soft hardware (HDL) modules, application compiler, and graphical user interface. It is designed with a high degree of modularity to support fast exploration of future multicore processors of di erent topologies, routing schemes, processing elements (cores), and memory system organizations. It is a component-based framework with parameterized interfaces and strong emphasis on module reusability. The compiler toolchain is used to map C or C++ based applications onto the processing units. The GUI allows the user to quickly con gure and launch a system instance for easy factorial development and evaluation. Hardware modules are implemented in synthesizable Verilog and are FPGA platform independent. The Heracles tool is freely available under the open-source MIT license at: http://projects.csail.mit.edu/heracle

    Bibliometric Review of NoC Router Optimization

    Get PDF
    Network on chip (NoC) has been proposed as an emerging solution for scalability and performance demands of next generation System on Chip (SoC). NoC provides a solution for the bus based interconnection issue of SoC, where large numbers of Intellectual Property modules (IP) are integrated on a single chip for better performance. The NoC has several advantages such as scalability, low latency and low power consumption, high bandwidth over dedicated wires and buses. Interconnections between multiple chip cores have a significant impact on the communication and performance of the chip design in terms of region, latency, throughput and power. In the NoC architecture, the router is a dominant component that significantly affects the performance of the NoC. NoC router architectures evolved since the year 2002 and progress in the domain pertaining to the optimization in the NoC router architectures has been discussed. The key objective of this bibliometric review is to understand the extent of the existing literature in the domain of performance efficient NoC router architectures. The bibliometric analysis is primarily based on data extracted from Scopus. It reveals that major contributions are done by researchers from USA, China followed by India in the form of conference, journals and articles publications. The major contribution is by the subject areas of Computer Science and Engineering followed by Mathematics and Material Science. The geographical analysis is done by using the GPS visualize tool. The clusters were created using Gephi

    Low Cost Interconnected Architecture for the Hardware Spiking Neural Networks

    Get PDF
    A novel low cost interconnected architecture (LCIA) is proposed in this paper, which is an efficient solution for the neuron interconnections for the hardware spiking neural networks (SNNs). It is based on an all-to-all connection that takes each paired input and output nodes of multi-layer SNNs as the source and destination of connections. The aim is to maintain an efficient routing performance under low hardware overhead. A Networks-on-Chip (NoC) router is proposed as the fundamental component of the LCIA, where an effective scheduler is designed to address the traffic challenge due to irregular spikes. The router can find requests rapidly, make the arbitration decision promptly, and provide equal services to different network traffic requests. Experimental results show that the LCIA can manage the intercommunication of the multi-layer neural networks efficiently and have a low hardware overhead which can maintain the scalability of hardware SNNs

    A Scalable and Adaptive Network on Chip for Many-Core Architectures

    Get PDF
    In this work, a scalable network on chip (NoC) for future many-core architectures is proposed and investigated. It supports different QoS mechanisms to ensure predictable communication. Self-optimization is introduced to adapt the energy footprint and the performance of the network to the communication requirements. A fault tolerance concept allows to deal with permanent errors. Moreover, a template-based automated evaluation and design methodology and a synthesis flow for NoCs is introduced
    corecore