1,103 research outputs found
Reconfigurable Data Planes for Scalable Network Virtualization
Abstract—Network virtualization presents a powerful approach to share physical network infrastructure among multiple virtual networks. Recent advances in network virtualization advocate the use of field-programmable gate arrays (FPGAs) as flexible high performance alternatives to conventional host virtualization techniques. However, the limited on-chip logic and memory resources in FPGAs severely restrict the scalability of the virtualization platform and necessitate the implementation of efficient forwarding structures in hardware. The research described in this manuscript explores the implementation of a scalable heterogeneous network virtualization platform which integrates virtual data planes implemented in FPGAs with software data planes created using host virtualization techniques. The system exploits data plane heterogeneity to cater to the dynamic service requirements of virtual networks by migrating networks between software and hardware data planes. We demonstrate data plane migration as an effective technique to limit the impact of traffic on unmodified data planes during FPGA reconfiguration. Our system implements forwarding tables in a shared fashion using inexpensive off-chip memories and supports both Internet Protocol (IP) and non-IP based data planes. Experimental results show that FPGA-based data planes can offer two orders of magnitude better throughput than their software counterparts and FPGA reconfiguration can facilitate data plane customization within 12 seconds. An integrated system that supports up to 15 virtual networks has been validated on the NetFPGA platform
An analytical performance model for the Spidergon NoC
Networks on chip (NoC) emerged as a promising alternative to bus-based interconnect networks to handle the increasing communication requirements of the large systems on chip. Employing an appropriate topology for a NoC is of high importance mainly because it typically trade-offs between cross-cutting concerns such as performance and cost. The spidergon topology is a novel architecture which is proposed recently for NoC domain. The objective of the spidergon NoC has been addressing the need for a fixed and optimized topology to realize cost effective multi-processor SoC (MPSoC) development [7]. In this paper we analyze the traffic behavior in the spidergon scheme and present an analytical evaluation of the average message latency in the architecture. We prove the validity of the analysis by comparing the model against the results produced by a discreteevent simulator
Automated Hardware Prototyping for 3D Network on Chips
Vor mehr als 50 Jahren stellte Intel® Mitbegründer Gordon Moore eine Prognose zum Entwicklungsprozess der Transistortechnologie auf. Er prognostizierte, dass sich die Zahl der Transistoren in integrierten Schaltungen alle zwei Jahre verdoppeln wird. Seine Aussage ist immer noch gültig, aber ein Ende von Moores Gesetz ist in Sicht. Mit dem Ende von Moore’s Gesetz müssen neue Aspekte untersucht werden, um weiterhin die Leistung von integrierten Schaltungen zu steigern. Zwei mögliche Ansätze für "More than Moore” sind 3D-Integrationsverfahren und heterogene Systeme. Gleichzeitig entwickelt sich ein Trend hin zu Multi-Core Prozessoren, basierend auf Networks on chips (NoCs).
Neben dem Ende des Mooreschen Gesetzes ergeben sich bei immer kleiner werdenden Technologiegrößen, vor allem jenseits der 60 nm, neue Herausforderungen. Eine Schwierigkeit ist die Wärmeableitung in großskalierten integrierten Schaltkreisen und die daraus resultierende Überhitzung des Chips. Um diesem Problem in modernen Multi-Core Architekturen zu begegnen, muss auch die Verlustleistung der Netzwerkressourcen stark reduziert werden. Diese Arbeit umfasst eine durch Hardware gesteuerte Kombination aus Frequenzskalierung und Power Gating für 3D On-Chip Netzwerke, einschließlich eines FPGA Prototypen. Dafür wurde ein Takt-synchrones 2D Netzwerk auf ein dreidimensionales asynchrones Netzwerk mit mehreren Frequenzbereichen erweitert. Zusätzlich wurde ein skalierbares Online-Power-Management System mit geringem Ressourcenaufwand entwickelt.
Die Verifikation neuer Hardwarekomponenten ist einer der zeitaufwendigsten Schritte im Entwicklungsprozess hochintegrierter digitaler Schaltkreise. Um diese Aufgabe zu beschleunigen und um eine parallele Softwareentwicklung zu ermöglichen, wurde im Rahmen dieser Arbeit ein automatisiertes und benutzerfreundliches Tool für den Entwurf neuer Hardware Projekte entwickelt. Eine grafische Benutzeroberfläche zum Erstellen des gesamten Designablaufs, vom Erstellen der Architektur, Parameter Deklaration, Simulation, Synthese und Test ist Teil dieses Werkzeugs. Zudem stellt die Größe der Architektur für die Erstellung eines Prototypen eine besondere Herausforderung dar. Frühere Arbeiten haben es versäumt, eine schnelles und unkompliziertes Prototyping, insbesondere von Architekturen mit mehr als 50 Prozessorkernen, zu realisieren. Diese Arbeit umfasst eine Design Space Exploration und FPGA-basierte Prototypen von verschiedenen 3D-NoC Implementierungen mit mehr als 80 Prozessoren
Interconnect architectures for dynamically partially reconfigurable systems
Dynamically partially reconfigurable FPGAs (Field-Programmable Gate Arrays) allow
hardware modules to be placed and removed at runtime while other parts of the system
keep working. With their potential benefits, they have been the topic of a great
deal of research over the last decade. To exploit the partial reconfiguration capability of
FPGAs, there is a need for efficient, dynamically adaptive communication infrastructure
that automatically adapts as modules are added to and removed from the system.
Many bus and network-on-chip (NoC) architectures have been proposed to exploit this
capability on FPGA technology. However, few realizations have been reported in the
public literature to demonstrate or compare their performance in real world applications.
While partial reconfiguration can offer many benefits, it is still rarely exploited in practical
applications. Few full realizations of partially reconfigurable systems in current
FPGA technologies have been published. More application experiments are required to
understand the benefits and limitations of implementing partially reconfigurable systems
and to guide their further development. The motivation of this thesis is to fill this
research gap by providing empirical evidence of the cost and benefits of different interconnect
architectures. The results will provide a baseline for future research and will
be directly useful for circuit designers who must make a well-reasoned choice between
the alternatives.
This thesis contains the results of experiments to compare different NoC and bus interconnect
architectures for FPGA-based designs in general and dynamically partially
reconfigurable systems. These two interconnect schemes are implemented and evaluated
in terms of performance, area and power consumption using FFT (Fast Fourier
Transform) andANN(Artificial Neural Network) systems as benchmarks. Conclusions
drawn from these results include recommendations concerning the interconnect approach
for different kinds of applications. It is found that a NoC provides much better
performance than a single channel bus and similar performance to a multi-channel bus
in both parallel and parallel-pipelined FFT systems. This suggests that a NoC is a better choice for systems with multiple simultaneous communications like the FFT. Bus-based
interconnect achieves better performance and consume less area and power than NoCbased
scheme for the fully-connected feed-forward NN system. This suggests buses
are a better choice for systems that do not require many simultaneous communications
or systems with broadcast communications like a fully-connected feed-forward NN.
Results from the experiments with dynamic partial reconfiguration demonstrate that
buses have the advantages of better resource utilization and smaller reconfiguration
time and memory than NoCs. However, NoCs are more flexible and expansible. They
have the advantage of placing almost all of the communication infrastructure in the
dynamic reconfiguration region. This means that different applications running on the
FPGA can use different interconnection strategies without the overhead of fixed bus
resources in the static region.
Another objective of the research is to examine the partial reconfiguration process and
reconfiguration overhead with current FPGA technologies. Partial reconfiguration allows
users to efficiently change the number of running PEs to choose an optimal powerperformance
operating point at the minimum cost of reconfiguration. However, this
brings drawbacks including resource utilization inefficiency, power consumption overhead
and decrease in system operating frequency. The experimental results report a
50% of resource utilization inefficiency with a power consumption overhead of less
than 5% and a decrease in frequency of up to 32% compared to a static implementation.
The results also show that most of the drawbacks of partial reconfiguration implementation
come from the restrictions and limitations of partial reconfiguration design flow.
If these limitations can be addressed, partial reconfiguration should still be considered
with its potential benefits.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201
Will SDN be part of 5G?
For many, this is no longer a valid question and the case is considered
settled with SDN/NFV (Software Defined Networking/Network Function
Virtualization) providing the inevitable innovation enablers solving many
outstanding management issues regarding 5G. However, given the monumental task
of softwarization of radio access network (RAN) while 5G is just around the
corner and some companies have started unveiling their 5G equipment already,
the concern is very realistic that we may only see some point solutions
involving SDN technology instead of a fully SDN-enabled RAN. This survey paper
identifies all important obstacles in the way and looks at the state of the art
of the relevant solutions. This survey is different from the previous surveys
on SDN-based RAN as it focuses on the salient problems and discusses solutions
proposed within and outside SDN literature. Our main focus is on fronthaul,
backward compatibility, supposedly disruptive nature of SDN deployment,
business cases and monetization of SDN related upgrades, latency of general
purpose processors (GPP), and additional security vulnerabilities,
softwarization brings along to the RAN. We have also provided a summary of the
architectural developments in SDN-based RAN landscape as not all work can be
covered under the focused issues. This paper provides a comprehensive survey on
the state of the art of SDN-based RAN and clearly points out the gaps in the
technology.Comment: 33 pages, 10 figure
High performance communication on reconfigurable clusters
High Performance Computing (HPC) has matured to where it is an essential third pillar, along with theory and experiment, in most domains of science and engineering. Communication latency is a key factor that is limiting the performance of HPC, but can be addressed by integrating communication into accelerators. This integration allows accelerators to communicate with each other without CPU interactions, and even bypassing the network stack. Field Programmable Gate Arrays (FPGAs) are the accelerators that currently best integrate communication with computation. The large number of Multi-gigabit Transceivers (MGTs) on most high-end FPGAs can provide high-bandwidth and low-latency inter-FPGA connections. Additionally, the reconfigurable FPGA fabric enables tight coupling between computation kernel and network interface.
Our thesis is that an application-aware communication infrastructure for a multi-FPGA system makes substantial progress in solving the HPC communication bottleneck. This dissertation aims to provide an application-aware solution for communication infrastructure for FPGA-centric clusters. Specifically, our solution demonstrates application-awareness across multiple levels in the network stack, including low-level link protocols, router microarchitectures, routing algorithms, and applications.
We start by investigating the low-level link protocol and the impact of its latency variance on performance. Our results demonstrate that, although some link jitter is always present, we can still assume near-synchronous communication on an FPGA-cluster. This provides the necessary condition for statically-scheduled routing. We then propose two novel router microarchitectures for two different kinds of workloads: a wormhole Virtual Channel (VC)-based router for workloads with dynamic communication, and a statically-scheduled Virtual Output Queueing (VOQ)-based router for workloads with static communication. For the first (VC-based) router, we propose a framework that generates application-aware router configurations. Our results show that, by adding application-awareness into router configuration, the network performance of FPGA clusters can be substantially improved. For the second (VOQ-based) router, we propose a novel offline collective routing algorithm. This shows a significant advantage over a state-of-the-art collective routing algorithm.
We apply our communication infrastructure to a critical strong-scaling HPC kernel, the 3D FFT. The experimental results demonstrate that the performance of our design is faster than that on CPUs and GPUs by at least one order of magnitude (achieving strong scaling for the target applications). Surprisingly, the FPGA cluster performance is similar to that of an ASIC-cluster. We also implement the 3D FFT on another multi-FPGA platform: the Microsoft Catapult II cloud. Its performance is also comparable or superior to CPU and GPU HPC clusters. The second application we investigate is Molecular Dynamics Simulation (MD). We model MD on both FPGA clouds and clusters. We find that combining processing and general communication in the same device leads to extremely promising performance and the prospect of MD simulations well into the us/day range with a commodity cloud
- …