403 research outputs found
Master of Science
thesisIntegrated circuits often consist of multiple processing elements that are regularly tiled across the two-dimensional surface of a die. This work presents the design and integration of high speed relative timed routers for asynchronous network-on-chip. It researches NoC's efficiency through simplicity by directly translating simple T-router, source-routing, single-flit packet to higher radix routers. This work is intended to study performance and power trade-offs adding higher radix routers, 3D topologies, Virtual Channels, Accurate NoC modeling, and Transmission line communication links. Routers with and without virtual channels are designed and integrated to arrayed communication networks. Furthermore, the work investigates 3D networks with diffusive RC wires and transmission lines on long wrap interconnects
Automated Hardware Prototyping for 3D Network on Chips
Vor mehr als 50 Jahren stellte Intel® Mitbegründer Gordon Moore eine Prognose zum Entwicklungsprozess der Transistortechnologie auf. Er prognostizierte, dass sich die Zahl der Transistoren in integrierten Schaltungen alle zwei Jahre verdoppeln wird. Seine Aussage ist immer noch gültig, aber ein Ende von Moores Gesetz ist in Sicht. Mit dem Ende von Moore’s Gesetz müssen neue Aspekte untersucht werden, um weiterhin die Leistung von integrierten Schaltungen zu steigern. Zwei mögliche Ansätze für "More than Moore” sind 3D-Integrationsverfahren und heterogene Systeme. Gleichzeitig entwickelt sich ein Trend hin zu Multi-Core Prozessoren, basierend auf Networks on chips (NoCs).
Neben dem Ende des Mooreschen Gesetzes ergeben sich bei immer kleiner werdenden Technologiegrößen, vor allem jenseits der 60 nm, neue Herausforderungen. Eine Schwierigkeit ist die Wärmeableitung in großskalierten integrierten Schaltkreisen und die daraus resultierende Überhitzung des Chips. Um diesem Problem in modernen Multi-Core Architekturen zu begegnen, muss auch die Verlustleistung der Netzwerkressourcen stark reduziert werden. Diese Arbeit umfasst eine durch Hardware gesteuerte Kombination aus Frequenzskalierung und Power Gating für 3D On-Chip Netzwerke, einschließlich eines FPGA Prototypen. Dafür wurde ein Takt-synchrones 2D Netzwerk auf ein dreidimensionales asynchrones Netzwerk mit mehreren Frequenzbereichen erweitert. Zusätzlich wurde ein skalierbares Online-Power-Management System mit geringem Ressourcenaufwand entwickelt.
Die Verifikation neuer Hardwarekomponenten ist einer der zeitaufwendigsten Schritte im Entwicklungsprozess hochintegrierter digitaler Schaltkreise. Um diese Aufgabe zu beschleunigen und um eine parallele Softwareentwicklung zu ermöglichen, wurde im Rahmen dieser Arbeit ein automatisiertes und benutzerfreundliches Tool für den Entwurf neuer Hardware Projekte entwickelt. Eine grafische Benutzeroberfläche zum Erstellen des gesamten Designablaufs, vom Erstellen der Architektur, Parameter Deklaration, Simulation, Synthese und Test ist Teil dieses Werkzeugs. Zudem stellt die Größe der Architektur für die Erstellung eines Prototypen eine besondere Herausforderung dar. Frühere Arbeiten haben es versäumt, eine schnelles und unkompliziertes Prototyping, insbesondere von Architekturen mit mehr als 50 Prozessorkernen, zu realisieren. Diese Arbeit umfasst eine Design Space Exploration und FPGA-basierte Prototypen von verschiedenen 3D-NoC Implementierungen mit mehr als 80 Prozessoren
Network-on-Chip
Limitations of bus-based interconnections related to scalability, latency, bandwidth, and power consumption for supporting the related huge number of on-chip resources result in a communication bottleneck. These challenges can be efficiently addressed with the implementation of a network-on-chip (NoC) system. This book gives a detailed analysis of various on-chip communication architectures and covers different areas of NoCs such as potentials, architecture, technical challenges, optimization, design explorations, and research directions. In addition, it discusses current and future trends that could make an impactful and meaningful contribution to the research and design of on-chip communications and NoC systems
Exploration and Design of Power-Efficient Networked Many-Core Systems
Multiprocessing is a promising solution to meet the requirements of near future applications. To get full benefit from parallel processing, a manycore system needs efficient, on-chip communication architecture. Networkon- Chip (NoC) is a general purpose communication concept that offers highthroughput, reduced power consumption, and keeps complexity in check by a regular composition of basic building blocks. This thesis presents power efficient communication approaches for networked many-core systems. We address a range of issues being important for designing power-efficient manycore systems at two different levels: the network-level and the router-level.
From the network-level point of view, exploiting state-of-the-art concepts such as Globally Asynchronous Locally Synchronous (GALS), Voltage/ Frequency Island (VFI), and 3D Networks-on-Chip approaches may be a solution to the excessive power consumption demanded by today’s and future many-core systems. To this end, a low-cost 3D NoC architecture, based on high-speed GALS-based vertical channels, is proposed to mitigate high peak temperatures, power densities, and area footprints of vertical interconnects in 3D ICs. To further exploit the beneficial feature of a negligible inter-layer distance of 3D ICs, we propose a novel hybridization scheme for inter-layer communication. In addition, an efficient adaptive routing algorithm is presented which enables congestion-aware and reliable communication for the hybridized NoC architecture. An integrated monitoring and management platform on top of this architecture is also developed in order to implement more scalable power optimization techniques.
From the router-level perspective, four design styles for implementing power-efficient reconfigurable interfaces in VFI-based NoC systems are proposed. To enhance the utilization of virtual channel buffers and to manage their power consumption, a partial virtual channel sharing method for NoC routers is devised and implemented.
Extensive experiments with synthetic and real benchmarks show significant power savings and mitigated hotspots with similar performance compared to latest NoC architectures. The thesis concludes that careful codesigned elements from different network levels enable considerable power savings for many-core systems.Siirretty Doriast
New Fault Tolerant Multicast Routing Techniques to Enhance Distributed-Memory Systems Performance
Distributed-memory systems are a key to achieve high performance computing and the most favorable architectures used in advanced research problems. Mesh connected multicomputer are one of the most popular architectures that have been implemented in many distributed-memory systems. These systems must support communication operations efficiently to achieve good performance. The wormhole switching technique has been widely used in design of distributed-memory systems in which the packet is divided into small flits. Also, the multicast communication has been widely used in distributed-memory systems which is one source node sends the same message to several destination nodes. Fault tolerance refers to the ability of the system to operate correctly in the presence of faults. Development of fault tolerant multicast routing algorithms in 2D mesh networks is an important issue. This dissertation presents, new fault tolerant multicast routing algorithms for distributed-memory systems performance using wormhole routed 2D mesh. These algorithms are described for fault tolerant routing in 2D mesh networks, but it can also be extended to other topologies. These algorithms are a combination of a unicast-based multicast algorithm and tree-based multicast algorithms. These algorithms works effectively for the most commonly encountered faults in mesh networks, f-rings, f-chains and concave fault regions. It is shown that the proposed routing algorithms are effective even in the presence of a large number of fault regions and large size of fault region. These algorithms are proved to be deadlock-free. Also, the problem of fault regions overlap is solved. Four essential performance metrics in mesh networks will be considered and calculated; also these algorithms are a limited-global-information-based multicasting which is a compromise of local-information-based approach and global-information-based approach. Data mining is used to validate the results and to enlarge the sample. The proposed new multicast routing techniques are used to enhance the performance of distributed-memory systems. Simulation results are presented to demonstrate the efficiency of the proposed algorithms
Quarc: an architecture for efficient on-chip communication
The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems.
Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm.
By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission
commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence.
To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of
the building blocks of the architecture, including topology, router and network interface.
The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture.
Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication
Embedded dynamic programming networks for networks-on-chip
PhD ThesisRelentless technology downscaling and recent technological advancements
in three dimensional integrated circuit (3D-IC) provide a promising
prospect to realize heterogeneous system-on-chip (SoC) and homogeneous
chip multiprocessor (CMP) based on the networks-onchip
(NoCs) paradigm with augmented scalability, modularity and
performance. In many cases in such systems, scheduling and managing
communication resources are the major design and implementation
challenges instead of the computing resources. Past research
efforts were mainly focused on complex design-time or simple heuristic
run-time approaches to deal with the on-chip network resource
management with only local or partial information about the network.
This could yield poor communication resource utilizations and amortize
the benefits of the emerging technologies and design methods.
Thus, the provision for efficient run-time resource management in
large-scale on-chip systems becomes critical. This thesis proposes a
design methodology for a novel run-time resource management infrastructure
that can be realized efficiently using a distributed architecture,
which closely couples with the distributed NoC infrastructure. The
proposed infrastructure exploits the global information and status
of the network to optimize and manage the on-chip communication
resources at run-time.
There are four major contributions in this thesis. First, it presents a
novel deadlock detection method that utilizes run-time transitive closure
(TC) computation to discover the existence of deadlock-equivalence
sets, which imply loops of requests in NoCs. This detection scheme,
TC-network, guarantees the discovery of all true-deadlocks without
false alarms in contrast to state-of-the-art approximation and heuristic
approaches. Second, it investigates the advantages of implementing
future on-chip systems using three dimensional (3D) integration and
presents the design, fabrication and testing results of a TC-network
implemented in a fully stacked three-layer 3D architecture using a
through-silicon via (TSV) complementary metal-oxide semiconductor
(CMOS) technology. Testing results demonstrate the effectiveness
of such a TC-network for deadlock detection with minimal computational
delay in a large-scale network. Third, it introduces an adaptive
strategy to effectively diffuse heat throughout the three dimensional
network-on-chip (3D-NoC) geometry. This strategy employs a dynamic
programming technique to select and optimize the direction of data
manoeuvre in NoC. It leads to a tool, which is based on the accurate
HotSpot thermal model and SystemC cycle accurate model, to simulate
the thermal system and evaluate the proposed approach. Fourth, it
presents a new dynamic programming-based run-time thermal management
(DPRTM) system, including reactive and proactive schemes, to
effectively diffuse heat throughout NoC-based CMPs by routing packets
through the coolest paths, when the temperature does not exceed
chip’s thermal limit. When the thermal limit is exceeded, throttling is
employed to mitigate heat in the chip and DPRTM changes its course
to avoid throttled paths and to minimize the impact of throttling on
chip performance.
This thesis enables a new avenue to explore a novel run-time resource
management infrastructure for NoCs, in which new methodologies
and concepts are proposed to enhance the on-chip networks for
future large-scale 3D integration.Iraqi Ministry of Higher Education and Scientific Research (MOHESR)
- …