486 research outputs found
Fast, Accurate and Detailed NoC Simulations
Network-on-Chip (NoC) architectures have a wide variety of parameters that can be adapted to the designer's requirements. Fast exploration of this parameter space is only possible at a high-level and several methods have been proposed. Cycle and bit accurate simulation is necessary when the actual router's RTL description needs to be evaluated and verified. However, extensive simulation of the NoC architecture with cycle and bit accuracy is prohibitively time consuming. In this paper we describe a simulation method to simulate large parallel homogeneous and heterogeneous network-on-chips on a single FPGA. The method is especially suitable for parallel systems where lengthy cycle and bit accurate simulations are required. As a case study, we use a NoC that was modelled and simulated in SystemC. We simulate the same NoC on the described FPGA simulator. This enables us to observe the NoC behavior under a large variety of traffic patterns. Compared with the SystemC simulation we achieved a speed-up of 80-300, without compromising the cycle and bit level accuracy
Recommended from our members
On Multicast in Asynchronous Networks-on-Chip: Techniques, Architectures, and FPGA Implementation
In this era of exascale computing, conventional synchronous design techniques are facing unprecedented challenges. The consumer electronics market is replete with many-core systems in the range of 16 cores to thousands of cores on chip, integrating multi-billion transistors. However, with this ever increasing complexity, the traditional design approaches are facing key issues such as increasing chip power, process variability, aging, thermal problems, and scalability. An alternative paradigm that has gained significant interest in the last decade is asynchronous design. Asynchronous designs have several potential advantages: they are naturally energy proportional, burning power only when active, do not require complex clock distribution, are robust to different forms of variability, and provide ease of composability for heterogeneous platforms. Networks-on-chip (NoCs) is an interconnect paradigm that has been introduced to deal with the ever-increasing system complexity. NoCs provide a distributed, scalable, and efficient interconnect solution for today’s many-core systems. Moreover, NoCs are a natural match with asynchronous design techniques, as they separate communication infrastructure and timing from the computational elements. To this end, globally-asynchronous locally-synchronous (GALS) systems that interconnect multiple processing cores, operating at different clock speeds, using an asynchronous NoC, have gained significant interest. While asynchronous NoCs have several advantages, they also face a key challenge of supporting new types of traffic patterns. Once such pattern is multicast communication, where a source sends packets to arbitrary number of destinations. Multicast is not only common in parallel computing, such as for cache coherency, but also for emerging areas such as neuromorphic computing. This important capability has been largely missing from asynchronous NoCs. This thesis introduces several efficient multicast solutions for these interconnects. In particular, techniques, and network architectures are introduced to support high-performance and low-power multicast. Two leading network topologies are the focus: a variant mesh-of-trees (MoT) and a 2D mesh. In addition, for a more realistic implementation and analysis, as well as significantly advancing the field of asynchronous NoCs, this thesis also targets synthesis of these NoCs on commercial FPGAs. While there has been significant advances in FPGA technologies, there has been only limited research on implementing asynchronous NoCs on FPGAs. To this end, a systematic computeraided design (CAD) methodology has been introduced to efficiently and safely map asynchronous NoCs on FPGAs. Overall, this thesis makes the following three contributions. The first contribution is a multicast solution for a variant MoT network topology. This topology consists of simple low-radix switches, and has been used in high-performance computing platforms. A novel local speculation technique is introduced, where a subset of the network’s switches are speculative that always broadcast every packet. These switches are very simple and have high performance. Speculative switches are surrounded by non-speculative ones that route packets based on their destinations and also throttle any redundant copies created by the former. This hybrid network architecture achieved significant performance and power benefits over other multicast approaches. The second contribution is a multicast solution for a 2D-mesh topology, which is more complex with higher-radix switches and also is more commonly used. A novel continuous-time replication strategy is introduced to optimize the critical multi-way forking operation of a multicast transmission. In this technique, a multicast packet is first stored in an input port of a switch, from where it is sent through distinct output ports towards different destinations concurrently, at each output’s own rate and in continuous time. This strategy is shown to have significant latency and energy benefits over an approach that performs multicast using multiple distinct serial unicasts to each destination. Finally, a systematic CAD methodology is introduced to synthesize asynchronous NoCs on commercial FPGAs. A two-fold goal is targeted: correctness and high performance. For ease of implementation, only existing FPGA synthesis tools are used. Moreover, since asynchronous NoCs involve special asynchronous components, a comprehensive guide is introduced to map these elements correctly and efficiently. Two asynchronous NoC switches are synthesized using the proposed approach on a leading Xilinx FPGA in 28 nm: one that only handles unicast, and the other that also supports multicast. Both showed significant energy benefits with some performance gains over a state-of-the-art synchronous switch
Interconnect architectures for dynamically partially reconfigurable systems
Dynamically partially reconfigurable FPGAs (Field-Programmable Gate Arrays) allow
hardware modules to be placed and removed at runtime while other parts of the system
keep working. With their potential benefits, they have been the topic of a great
deal of research over the last decade. To exploit the partial reconfiguration capability of
FPGAs, there is a need for efficient, dynamically adaptive communication infrastructure
that automatically adapts as modules are added to and removed from the system.
Many bus and network-on-chip (NoC) architectures have been proposed to exploit this
capability on FPGA technology. However, few realizations have been reported in the
public literature to demonstrate or compare their performance in real world applications.
While partial reconfiguration can offer many benefits, it is still rarely exploited in practical
applications. Few full realizations of partially reconfigurable systems in current
FPGA technologies have been published. More application experiments are required to
understand the benefits and limitations of implementing partially reconfigurable systems
and to guide their further development. The motivation of this thesis is to fill this
research gap by providing empirical evidence of the cost and benefits of different interconnect
architectures. The results will provide a baseline for future research and will
be directly useful for circuit designers who must make a well-reasoned choice between
the alternatives.
This thesis contains the results of experiments to compare different NoC and bus interconnect
architectures for FPGA-based designs in general and dynamically partially
reconfigurable systems. These two interconnect schemes are implemented and evaluated
in terms of performance, area and power consumption using FFT (Fast Fourier
Transform) andANN(Artificial Neural Network) systems as benchmarks. Conclusions
drawn from these results include recommendations concerning the interconnect approach
for different kinds of applications. It is found that a NoC provides much better
performance than a single channel bus and similar performance to a multi-channel bus
in both parallel and parallel-pipelined FFT systems. This suggests that a NoC is a better choice for systems with multiple simultaneous communications like the FFT. Bus-based
interconnect achieves better performance and consume less area and power than NoCbased
scheme for the fully-connected feed-forward NN system. This suggests buses
are a better choice for systems that do not require many simultaneous communications
or systems with broadcast communications like a fully-connected feed-forward NN.
Results from the experiments with dynamic partial reconfiguration demonstrate that
buses have the advantages of better resource utilization and smaller reconfiguration
time and memory than NoCs. However, NoCs are more flexible and expansible. They
have the advantage of placing almost all of the communication infrastructure in the
dynamic reconfiguration region. This means that different applications running on the
FPGA can use different interconnection strategies without the overhead of fixed bus
resources in the static region.
Another objective of the research is to examine the partial reconfiguration process and
reconfiguration overhead with current FPGA technologies. Partial reconfiguration allows
users to efficiently change the number of running PEs to choose an optimal powerperformance
operating point at the minimum cost of reconfiguration. However, this
brings drawbacks including resource utilization inefficiency, power consumption overhead
and decrease in system operating frequency. The experimental results report a
50% of resource utilization inefficiency with a power consumption overhead of less
than 5% and a decrease in frequency of up to 32% compared to a static implementation.
The results also show that most of the drawbacks of partial reconfiguration implementation
come from the restrictions and limitations of partial reconfiguration design flow.
If these limitations can be addressed, partial reconfiguration should still be considered
with its potential benefits.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 201
From MARTE to Reconfigurable NoCs: A model driven design methodology
Due to the continuous exponential rise in SoC's design complexity, there is a critical need to find new seamless methodologies and tools to handle the SoC co-design aspects. We address this issue and propose a novel SoC co-design methodology based on Model Driven Engineering and the MARTE (Modeling and Analysis of Real-Time and Embedded Systems) standard proposed by Object Management Group, to raise the design abstraction levels. Extensions of this standard have enabled us to move from high level specifications to execution platforms such as reconfigurable FPGAs. In this paper, we present a high level modeling approach that targets modern Network on Chips systems. The overall objective: to perform system modeling at a high abstraction level expressed in Unified Modeling Language (UML); and afterwards, transform these high level models into detailed enriched lower level models in order to automatically generate the necessary code for final FPGA synthesis
Performance Evaluation of XY and XTRANC Routing Algorithm for Network on Chip and Implementation using DART Simulator
In today’s world Network on Chip(NoC) is one of the most efficient on chip communication platform for System on Chip where a large amount of computational and storage blocks are integrated on a single chip. NoCs are scalable and have tackled the short commings of SoCs . In the first part of this project the basics of NoCs is explained which includes why we should use NoC , how to implement NoC ,various blocks of NoCs .The next part of the project deals with the implementation of XY routing algorithm in mesh (3*3) and mesh (4*4) network topologies. The throughput and latency curves for both the topologies were found and a through comparison was done by varying the no of virtual cannels. In the next part an improvised routing algorithm known as the extended torus(XTRANC) routing algorithm for NoCs implementation is explained. This algorithm is designed for inner torus mesh networks and provides better performance than usual routing algorithms. It has been implemented using the CONNECT simulator. Then the DART simulator was explored and two important components namely the flitqueue and the traffic generator was designed using this simulator
- …