22,783 research outputs found
Efficient Connection Allocator in Network-on-Chip
As semiconductor technologies develop, a System-on-Chip (SoC) that integrates all semiconductor intellectual property (IP) cores is suggested and widely used for various applications. A traditional bus interconnection does not support transmitting data between IP cores for high performance. Because of this reason, a Network-on-Chip (NoC) has been suggested to provide an efficient and scalable solution to interconnect among all IP cores. High throughput and low latency have recently become the main important factors of NoC for achieving hard guaranteed real-time systems. In order to guarantee these factors and provide real-time service (i.e., Guaranteed Service, GS), the circuit switching (CS) approach has been widely utilized. The CS approach allocates mutually exclusive paths to transmitting data between different sources and destinations using dedicated NoC resources. However, the exclusive occupancy of the allocated path reduces the efficiency of the overall use of NoC resources. In order to solve this problem, Space-Division-Multiplexing (SDM) and Time-Division-Multiplexing (TDM) techniques have been suggested. SDM implements a circuit switching technique by assigning physically different NoC-links between different connections. Path connections of the SDM technique based on spatial resources assignment do not provide high scalability. In contrast to this, using virtual time slots for a path connection, the TDM technique can share physical links between exclusively established connections, thereby improving NoC path diversity.
For all of these mentioned techniques, the factor that significantly impacts the system efficiency or performance scaling is how the path is allocated. In recent years, a dynamic connection allocation approach that can cope with highly dynamic workloads has been gaining attention due to the sudden and diverse demands of applications in real-time systems. There are two groups in the dynamic connection allocation approach. One is a distributed allocation technique, and the other is a centralized allocation technique. While distributed allocation exploits additional logic integrated into the NoC-routers for path search and allocation, the centralized approach makes use of a central unit to manage the path allocation problem. There are several algorithms for the centralized allocation technique. Trellis search-based allocation approach shows the best performance among them.
Many algorithms related to centralized connection allocators have been studied extensively during the past decade. However, relatively little attention was paid to methodology in analyzing and evaluating the centralized connection allocation algorithms. In order to further develop the algorithms, it is necessary to understand and evaluate the centralized connection allocator by establishing a new analysis methodology. Thus, this thesis presents a performance analysis methodology for the trellis search-based allocation approach. Firstly, this thesis proposes a system model for analysis. Secondly, performance metrics are defined. Finally, the analysis results of each performance metric related to the trellis search-based allocation approach are presented. Through this analysis, the performance of the trellis search-based allocation approach can be accurately analyzed. Although a simulation is not performed, the upper limit of performance of the trellis search-based allocation approach can also be predicted through the analysis metrics. Additionally, we introduce the general formulation of the trellis search-based path allocation algorithm. The weight values among available paths through the branch metric and path metric are proposed to enable higher performance path connection. Furthermore, according to network size, topology, TDM, interface load delivery, and router internal storage, the performance of trellis search-based path allocation algorithms is also described.
In the end, the Application Specific Instruction Processor (ASIP) hardware platform customized for the trellis search-based path allocation algorithm is presented. The shortest available and lowest-cost (SALC) path search algorithm is proposed to improve the success rate of path connection in the ASIP hardware platform. We evaluate the algorithm performance and implementation synthesis results. In order to realize the dynamic connection approach, a short execution cycle of ASIP time is essential.
We develop several algorithms to achieve this short execution cycle. The first one is a rectangular region of search algorithm that allows adapting the size and form of path search region according to the particular source-destination positions and considers actual operational constraints. The average execution cycles for searching an optimum path are decreased because the unnecessary region for path-search is excluded. The second one is a path-spreading search algorithm that separates between involved routers and uninvolved routers in path search. The involved routers are selected and spread out from source to destination at each intermediate trellis-search process. The path-search overhead is considerably reduced due to the router involvements. The third one is a three-directional path-spreading search algorithm that eliminates one direction movement among four spreading movements. Because of this reason, the trellis search-based path connection algorithm, which omits the back-tracing process, can be implemented in the ASIP platform. Thus, the whole algorithm execution time can be halved. The last one is a moving regional path search algorithm that significantly reduces computation complexity by selecting a constant dimensional path-search region that affects performance and moving the region from source to destination. The moving regional path search algorithm achieves a considerable decrement of computational complexity.:1 Introduction 1
1.1 NoC-interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Connection allocation in a Network-on-Chip 7
2.1 Circuit Switching NoCs . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Guaranteed Service in NoCs . . . . . . . . . . . . . . . . . . . 7
2.1.2 Spatial-Division-Multiplexing technique . . . . . . . . . . . . 8
2.1.3 Time-Division-Multiplexing technique . . . . . . . . . . . . . 10
2.2 System architectures employing circuit switching NoCs . . . . . . . . 11
2.2.1 Static and dynamic connection allocation . . . . . . . . . . . 12
2.2.2 Distributed connection allocation technique . . . . . . . . . . 14
2.2.3 Centralized connection allocation technique . . . . . . . . . . 16
2.2.4 Algorithms for centralized connection allocation . . . . . . . . 17
2.2.4.1 Software based run-time path allocation approach . 18
2.2.4.2 Trellis search-based allocation approach . . . . . . . 19
3 Performance analysis methodology for a centralized connection allocator
23
3.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Performance metrics and analysis methodology . . . . . . . . . . . . 25
3.3 System simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Trellis search-based path allocation algorithm 45
4.1 General formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Trellis graph structure . . . . . . . . . . . . . . . . . . . . . . 45
4.1.2 Survivor path selection criterion . . . . . . . . . . . . . . . . . 52
ix
4.1.2.1 Branch metric and path metric . . . . . . . . . . . . 52
4.1.2.2 The shortest-available and lowest-cost path selection
criterion . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Algorithm Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.1 Network topology . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.2 Network size . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.3 Time-Division-Multiplexing . . . . . . . . . . . . . . . . . . . 61
4.2.4 NoC interface load diversity . . . . . . . . . . . . . . . . . . . 63
4.2.5 The internal storage of the router . . . . . . . . . . . . . . . . 66
5 ASIP approach for Trellis search-based connection allocation 73
5.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1.1 Trellis search-based ASIP platform architecture . . . . . . . . 74
5.2 Algorithm for improving success rates of path connection . . . . . . . 81
5.2.1 SALC algorithm for Trellis search-based ASIP platform . . . . 81
5.2.2 Performance evaluation of the SALC algorithm . . . . . . . . 88
5.2.2.1 Simulation results . . . . . . . . . . . . . . . . . . . 88
5.2.2.2 Synthesis results . . . . . . . . . . . . . . . . . . . . 91
5.3 Algorithm for reducing path-search time . . . . . . . . . . . . . . . . 93
5.3.1 Rectangular regional path search algorithm . . . . . . . . . . 93
5.3.2 Path-spreading search algorithm . . . . . . . . . . . . . . . . 99
5.3.3 Three directional path-spreading search algorithm . . . . . . 108
5.3.4 Moving regional path search algorithm . . . . . . . . . . . . . 114
5.3.5 Performance evaluation . . . . . . . . . . . . . . . . . . . . . 123
5.3.5.1 Simulation results . . . . . . . . . . . . . . . . . . . 123
5.3.5.2 Synthesis results . . . . . . . . . . . . . . . . . . . . 126
6 Conclusion and Future work 131
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Bibliography 13
Recommended from our members
Behavioral synthesis from VHDL using structured modeling
This dissertation describes work in behavioral synthesis involving the development of a VHDL Synthesis System VSS which accepts a VHDL behavioral input specification and performs technology independent synthesis to generate a circuit netlist of generic components. The VHDL language is used for input and output descriptions. An intermediate representation which incorporates signal typing and component attributes simplifies compilation and facilitates design optimization.A Structured Modeling methodology has been developed to suggest standard VHDL modeling practices for synthesis. Structured modeling provides recommendations for the use of available VHDL description styles so that optimal designs will be synthesized.A design composed of generic components is synthesized from the input description through a process of Graph Compilation, Graph Criticism, and Design Compilation. Experiments were performed to demonstrate the effects of different modeling styles on the quality of the design produced by VSS. Several alternative VHDL models were examined for each benchmark, illustrating the improvements in design quality achieved when Structured Modeling guidelines were followed
Recommended from our members
CHASSIS : a combined hardware selection and scheduling technique for performance driven synthesis
This report describes a new technique that combines the Hardware Scheduling and Component Selection phases for High Level Synthesis. Our technique simultaneously selects components from a given library while it schedules the operations into different control steps. The algorĂthm improves previous work in scheduling because component costs and performance are considered during the scheduling process, enlarging the design search space and resulting in better optimized desĂgns
A Multi-objective Perspective for Operator Scheduling using Fine-grained DVS Architecture
The stringent power budget of fine grained power managed digital integrated
circuits have driven chip designers to optimize power at the cost of area and
delay, which were the traditional cost criteria for circuit optimization. The
emerging scenario motivates us to revisit the classical operator scheduling
problem under the availability of DVFS enabled functional units that can
trade-off cycles with power. We study the design space defined due to this
trade-off and present a branch-and-bound(B/B) algorithm to explore this state
space and report the pareto-optimal front with respect to area and power. The
scheduling also aims at maximum resource sharing and is able to attain
sufficient area and power gains for complex benchmarks when timing constraints
are relaxed by sufficient amount. Experimental results show that the algorithm
that operates without any user constraint(area/power) is able to solve the
problem for most available benchmarks, and the use of power budget or area
budget constraints leads to significant performance gain.Comment: 18 pages, 6 figures, International journal of VLSI design &
Communication Systems (VLSICS
The synthesis of a hardware scheduler for Non-Manifest Loops
This paper addresses the hardware implementation of a dynamic scheduler for non-manifest data dependent periodic loops. Static scheduling techniques which are known to give near optimal scheduling-solutions for manifest loops, fail at scheduling non-manifest loops, since they lack the run time information needed which makes a static schedule feasible. In this paper a dynamic scheduling approach was chosen to circumvent this problem. We present a case study using VHDL where the focus lies on implementations with minimal memory usage and low communication overhead between various components of the architecture. This has resulted in an efficient and synthesisable system
Recommended from our members
Chippe : a system for constraint driven behavioral synthesis
This report describes the Chippe system, gives some background previous work and describes several sample design runs of the system. Also presented are the sources of the design tradeoffs used by Chippe, and overview of the internal design model, and experiences using the system
Recommended from our members
A survey of behavioral-level partitioning systems
Many approaches have been developed to partition a system's behavioral description before a structural implementation is synthesized. We highlight the foundations and motivations for behavioral partitioning. We survey behavioral partitioning approaches, discussing abstraction levels, goals, major steps, and key assumptions in each
Recommended from our members
Layout-driven allocation for high level synthesis
We propose a hypergraph model and a new algorithm for hardware allocation. The use of a hypergraph model facilitates the identification of sharable resources and the calculation of interconnect costs. Using the hyper graph model, the algorithm performs interconnect optimization by taking into account interdependent relationships between three allocation subtasks: register, operation, and interconnect allocations simultaneously. Previous algorithms considered these three tasks serially. Another novel contribution of our algorithm is the exploration of design space by trading off storage units and interconnects. We also demonstrate that traditional cost functions using the number of registers and the number of mux-inputs can not guarantee the minimal area. To rectify the problem, we introduce a new layout area cost function and compare it to the traditional cost functions. Our experiments show that our algorithm is superior to previously published algorithms under traditional cost functions
- …