47 research outputs found

    Combining SDM-Based Circuit Switching with Packet Switching in a Router for On-Chip Networks

    Get PDF
    A Hybrid router architecture for Networks-on-Chip “NoC” is presented, it combines Spatial Division Multiplexing “SDM” based circuit switching and packet switching in order to efficiently and separately handle both streaming and best-effort traffic generated in real-time applications. Furthermore the SDM technique is combined with Time Division Multiplexing “TDM” technique in the circuit switching part in order to increase path diversity, thus improving throughput while sharing communication resources among multiple connections. Combining these two techniques allows mitigating the poor resource usage inherent to circuit switching. In this way Quality of Service “QoS” is easily provided for the streaming traffic through the circuit-switched sub-router while the packet-switched sub-router handles best-effort traffic. The proposed hybrid router architectures were synthesized, placed and routed on an FPGA. Results show that a practicable Network-on-Chip “NoC” can be built using the proposed router architectures. 7 × 7 mesh NoCs were simulated in SystemC. Simulation results show that the probability of establishing paths through the NoC increases with the number of sub-channels and has its highest value when combining SDM with TDM, thereby significantly reducing contention in the NoC

    Time to Rethink the Power of Packet Switching

    Get PDF
    Communication specialists all around the world are facing the same problem: shifting from circuit switching (CS) to packet switching (PS). Communication service providers are favoring “All-over-IP” technologies hoping to boost their profits by providing multimedia services. The main stakeholder in this field of the paradigm shift is the industry itself: packet switching hardware manufacturers are going to earn billions of dollars and thus pay engineers and journalists many millions for the promotion of the new paradigm. However, this drive for profit is tempered by life itself. This article is devoted to the discussion of the telecommunications development strategy. We will provide examples to illustrate the difficulties that complicate the transition from CS to PS and the move to hybrid CS+PS solutions. We are considering three areas: (1) network-on-a-chip (NoC), (2) the telecom strategy: how to build the core of the network, (3) the transition of the U.S. Defense Information System Network from SS7 signaling to IP protocol

    Efficient Connection Allocator in Network-on-Chip

    Get PDF
    As semiconductor technologies develop, a System-on-Chip (SoC) that integrates all semiconductor intellectual property (IP) cores is suggested and widely used for various applications. A traditional bus interconnection does not support transmitting data between IP cores for high performance. Because of this reason, a Network-on-Chip (NoC) has been suggested to provide an efficient and scalable solution to interconnect among all IP cores. High throughput and low latency have recently become the main important factors of NoC for achieving hard guaranteed real-time systems. In order to guarantee these factors and provide real-time service (i.e., Guaranteed Service, GS), the circuit switching (CS) approach has been widely utilized. The CS approach allocates mutually exclusive paths to transmitting data between different sources and destinations using dedicated NoC resources. However, the exclusive occupancy of the allocated path reduces the efficiency of the overall use of NoC resources. In order to solve this problem, Space-Division-Multiplexing (SDM) and Time-Division-Multiplexing (TDM) techniques have been suggested. SDM implements a circuit switching technique by assigning physically different NoC-links between different connections. Path connections of the SDM technique based on spatial resources assignment do not provide high scalability. In contrast to this, using virtual time slots for a path connection, the TDM technique can share physical links between exclusively established connections, thereby improving NoC path diversity. For all of these mentioned techniques, the factor that significantly impacts the system efficiency or performance scaling is how the path is allocated. In recent years, a dynamic connection allocation approach that can cope with highly dynamic workloads has been gaining attention due to the sudden and diverse demands of applications in real-time systems. There are two groups in the dynamic connection allocation approach. One is a distributed allocation technique, and the other is a centralized allocation technique. While distributed allocation exploits additional logic integrated into the NoC-routers for path search and allocation, the centralized approach makes use of a central unit to manage the path allocation problem. There are several algorithms for the centralized allocation technique. Trellis search-based allocation approach shows the best performance among them. Many algorithms related to centralized connection allocators have been studied extensively during the past decade. However, relatively little attention was paid to methodology in analyzing and evaluating the centralized connection allocation algorithms. In order to further develop the algorithms, it is necessary to understand and evaluate the centralized connection allocator by establishing a new analysis methodology. Thus, this thesis presents a performance analysis methodology for the trellis search-based allocation approach. Firstly, this thesis proposes a system model for analysis. Secondly, performance metrics are defined. Finally, the analysis results of each performance metric related to the trellis search-based allocation approach are presented. Through this analysis, the performance of the trellis search-based allocation approach can be accurately analyzed. Although a simulation is not performed, the upper limit of performance of the trellis search-based allocation approach can also be predicted through the analysis metrics. Additionally, we introduce the general formulation of the trellis search-based path allocation algorithm. The weight values among available paths through the branch metric and path metric are proposed to enable higher performance path connection. Furthermore, according to network size, topology, TDM, interface load delivery, and router internal storage, the performance of trellis search-based path allocation algorithms is also described. In the end, the Application Specific Instruction Processor (ASIP) hardware platform customized for the trellis search-based path allocation algorithm is presented. The shortest available and lowest-cost (SALC) path search algorithm is proposed to improve the success rate of path connection in the ASIP hardware platform. We evaluate the algorithm performance and implementation synthesis results. In order to realize the dynamic connection approach, a short execution cycle of ASIP time is essential. We develop several algorithms to achieve this short execution cycle. The first one is a rectangular region of search algorithm that allows adapting the size and form of path search region according to the particular source-destination positions and considers actual operational constraints. The average execution cycles for searching an optimum path are decreased because the unnecessary region for path-search is excluded. The second one is a path-spreading search algorithm that separates between involved routers and uninvolved routers in path search. The involved routers are selected and spread out from source to destination at each intermediate trellis-search process. The path-search overhead is considerably reduced due to the router involvements. The third one is a three-directional path-spreading search algorithm that eliminates one direction movement among four spreading movements. Because of this reason, the trellis search-based path connection algorithm, which omits the back-tracing process, can be implemented in the ASIP platform. Thus, the whole algorithm execution time can be halved. The last one is a moving regional path search algorithm that significantly reduces computation complexity by selecting a constant dimensional path-search region that affects performance and moving the region from source to destination. The moving regional path search algorithm achieves a considerable decrement of computational complexity.:1 Introduction 1 1.1 NoC-interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Connection allocation in a Network-on-Chip 7 2.1 Circuit Switching NoCs . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Guaranteed Service in NoCs . . . . . . . . . . . . . . . . . . . 7 2.1.2 Spatial-Division-Multiplexing technique . . . . . . . . . . . . 8 2.1.3 Time-Division-Multiplexing technique . . . . . . . . . . . . . 10 2.2 System architectures employing circuit switching NoCs . . . . . . . . 11 2.2.1 Static and dynamic connection allocation . . . . . . . . . . . 12 2.2.2 Distributed connection allocation technique . . . . . . . . . . 14 2.2.3 Centralized connection allocation technique . . . . . . . . . . 16 2.2.4 Algorithms for centralized connection allocation . . . . . . . . 17 2.2.4.1 Software based run-time path allocation approach . 18 2.2.4.2 Trellis search-based allocation approach . . . . . . . 19 3 Performance analysis methodology for a centralized connection allocator 23 3.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Performance metrics and analysis methodology . . . . . . . . . . . . 25 3.3 System simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Trellis search-based path allocation algorithm 45 4.1 General formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.1 Trellis graph structure . . . . . . . . . . . . . . . . . . . . . . 45 4.1.2 Survivor path selection criterion . . . . . . . . . . . . . . . . . 52 ix 4.1.2.1 Branch metric and path metric . . . . . . . . . . . . 52 4.1.2.2 The shortest-available and lowest-cost path selection criterion . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Algorithm Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.1 Network topology . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.2 Network size . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.3 Time-Division-Multiplexing . . . . . . . . . . . . . . . . . . . 61 4.2.4 NoC interface load diversity . . . . . . . . . . . . . . . . . . . 63 4.2.5 The internal storage of the router . . . . . . . . . . . . . . . . 66 5 ASIP approach for Trellis search-based connection allocation 73 5.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.1 Trellis search-based ASIP platform architecture . . . . . . . . 74 5.2 Algorithm for improving success rates of path connection . . . . . . . 81 5.2.1 SALC algorithm for Trellis search-based ASIP platform . . . . 81 5.2.2 Performance evaluation of the SALC algorithm . . . . . . . . 88 5.2.2.1 Simulation results . . . . . . . . . . . . . . . . . . . 88 5.2.2.2 Synthesis results . . . . . . . . . . . . . . . . . . . . 91 5.3 Algorithm for reducing path-search time . . . . . . . . . . . . . . . . 93 5.3.1 Rectangular regional path search algorithm . . . . . . . . . . 93 5.3.2 Path-spreading search algorithm . . . . . . . . . . . . . . . . 99 5.3.3 Three directional path-spreading search algorithm . . . . . . 108 5.3.4 Moving regional path search algorithm . . . . . . . . . . . . . 114 5.3.5 Performance evaluation . . . . . . . . . . . . . . . . . . . . . 123 5.3.5.1 Simulation results . . . . . . . . . . . . . . . . . . . 123 5.3.5.2 Synthesis results . . . . . . . . . . . . . . . . . . . . 126 6 Conclusion and Future work 131 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Bibliography 13

    Advanced Connection Allocation Techniques in Circuit Switching Network on Chip

    Get PDF
    With the advancement of semiconductor technology, the System on Chip (SoC) is becoming more and more complex, so the on-chip communication has become a bottleneck of SoC Design. Since the traditional bus system is inefficient and not scalable, the Network-On-Chip (NoC) has emerged as the promising communication mechanism for complex SoCs. As some systems have specific performance requirements, such as a minimum throughput (for real-time streaming data) or bounded latency (for interrupts, process synchronization, etc), communication with Guaranteed Service (GS) support becomes crucial for predictable SoC architectures. Circuit Switching (CS) is a popular approach to support GS, which firstly has to allocate an exclusively connection (circuit) between the source and destination nodes, and then the data packets are delivered over this connection. However, it is inefficient and inflexible because the resource is occupied by single connection during its whole lifetime, which can block other communications. Hence, two extensions of CS have been proposed to share resources: i) Time-Division Multiplexing (TDM), in which the available link capacity is split into multiple time slots to be shared by different flows in TDM scheme; and ii) Space-Division-Multiplexing (SDM), in which only a subset (sub-channel) of the link wires is exclusively allocated to a specific connection, while the remaining wires of the link can be used by other flows. The connection allocation is critical for CS, since the data delivery can start only after the associated connection is allocated. In this thesis, we propose a dedicated hardware connection allocator to solve the dynamic connection allocation problem for CS NoCs, which has to i) allocate a contention-free path between source-destination pairs and ii) allocate appropriate portions of link bandwidth (appropriate number of time slots and subsets) along the path. The dedicated connection allocator, called NoCManager, solves the connection allocation problem by employing a trellis-search based shortest path algorithm. The trellis search can explore all possible paths between source node and destination. Moreover, it shall find the requested path in a fixed low latency and can guarantee the path optimality in terms of path length if the path is available. In this thesis, two different trellis graphs, Forward-Backtrack trellis and Register-Exchange trellis are proposed. The Forward-Backtrack trellis completes the path search in two steps: forward search and backtracking. Firstly, the forward search begins at source node that traverses the network to find the free path. When destination node is reached, the backtrack starts from destination to select the survivor path and collect the associated path parameters. However, Register-Exchange trellis saves the entire survivor path sequences during forward search. Consequently, the backtracking step can be omitted, and thus the allocation time is halved compared to forward-backtrack approaches. Moreover, each trellis graph consists of three categories, unfolded structure, folded structure and bidirectional structure. The unfolded structure can provide high allocation speed while folded structure is more efficient from a hardware point of view. The bidirectional structure starts the search at two sides, source node and destination node simultaneously, so the allocation speed is 2 times faster than previous unidirectional search. Furthermore, in order to address the scalability issue of previous centralized systems, the partitioned architecture (i.e. spatial partitioning technique) is proposed to divide the large system into multiple smaller differentiated logical partitions served by local NoCManagers. This partitioning technique keeps the request load of the manager and manager-node communication overhead moderate. Inside each partition, the path search problem is solved by a local manager with trellis-search algorithm. To establish a path that crosses partitions, the managers communicate with each other in distributed manner to converge the global path. In order to further enhance the path diversity and resource utilization, we adopt the combined TDM and SDM technique. In combined TDM-SDM approach, each SDM sub-channel is split into multiple time slots so that can be shared by multiple flows. Hence, the number of sub-channels can be kept moderate to reduce router complexity, while still providing higher path diversity than TDM scheme. In order to investigate and optimize TDM-SDM partitioning strategy, we studied the influence of different TDM-SDM link partitioning strategies on success rate and path length that allowed us to find the optimal solution. The dedicated connection allocator using the trellis-search algorithm is employed for TDM, SDM and TDM-SDM CS. In the end, we present the router architecture that combines the circuit-switching network (for GS communication) and packet-switching network (for best-effort communication)

    A Scalable and Adaptive Network on Chip for Many-Core Architectures

    Get PDF
    In this work, a scalable network on chip (NoC) for future many-core architectures is proposed and investigated. It supports different QoS mechanisms to ensure predictable communication. Self-optimization is introduced to adapt the energy footprint and the performance of the network to the communication requirements. A fault tolerance concept allows to deal with permanent errors. Moreover, a template-based automated evaluation and design methodology and a synthesis flow for NoCs is introduced

    Dynamic Optical Networks for Data Centres and Media Production

    Get PDF
    This thesis explores all-optical networks for data centres, with a particular focus on network designs for live media production. A design for an all-optical data centre network is presented, with experimental verification of the feasibility of the network data plane. The design uses fast tunable (< 200 ns) lasers and coherent receivers across a passive optical star coupler core, forming a network capable of reaching over 1000 nodes. Experimental transmission of 25 Gb/s data across the network core, with combined wavelength switching and time division multiplexing (WS-TDM), is demonstrated. Enhancements to laser tuning time via current pre-emphasis are discussed, including experimental demonstration of fast wavelength switching (< 35 ns) of a single laser between all combinations of 96 wavelengths spaced at 50 GHz over a range wider than the optical C-band. Methods of increasing the overall network throughput by using a higher complexity modulation format are also described, along with designs for line codes to enable pulse amplitude modulation across the WS-TDM network core. The construction of an optical star coupler network core is investigated, by evaluating methods of constructing large star couplers from smaller optical coupler components. By using optical circuit switches to rearrange star coupler connectivity, the network can be partitioned, creating independent reserves of bandwidth and resulting in increased overall network throughput. Several topologies for constructing a star from optical couplers are compared, and algorithms for optimum construction methods are presented. All of the designs target strict criteria for the flexible and dynamic creation of multicast groups, which will enable future live media production workflows in data centres. The data throughput performance of the network designs is simulated under synthetic and practical media production traffic scenarios, showing improved throughput when reconfigurable star couplers are used compared to a single large star. An energy consumption evaluation shows reduced network power consumption compared to incumbent and other proposed data centre network technologies

    Spatial parallelism in the routers of asynchronous on-chip networks

    Get PDF
    State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore