327 research outputs found
Energy-efficient electrical and silicon-photonic networks in many core systems
Thesis (Ph.D.)--Boston UniversityDuring the past decade, the very large scale integration (VLSI) community has migrated towards incorporating multiple cores on a single chip to sustain the historic performance improvement in computing systems. As the core count continuously increases, the performance of network-on-chip (NoC), which is responsible for the communication between cores, caches and memory controllers, is increasingly becoming critical for sustaining the performance improvement. In this dissertation, we propose several methods to improve the energy efficiency of both electrical and silicon-photonic NoCs. Firstly, for electrical NoC, we propose a flow control technique, Express Virtual Channel with Taps (EVC-T), to transmit both broadcast and data packets efficiently in a mesh network. A low-latency notification tree network is included to maintain t he order of broadcast packets. The EVC-T technique improves the NoC latency by 24% and the system energy efficiency in terms of energy-delay product (EDP) by 13%. In the near future, the silicon-photonic links are projected to replace the electrical links for global on-chip communication due to their lower data-dependent power and higher bandwidth density, but the high laser power can more than offset these advantages. Therefore, we propose a silicon-photonic multi-bus NoC architecture and a methodology that can reduce the laser power by 49% on average through bandwidth reconfiguration at runtime based on the variations in bandwidth requirements of applications. We also propose a technique to reduce the laser power by dynamically activating/deactivating the 12 cache banks and switching ON/ OFF the corresponding silicon-photonic links in a crossbar NoC. This cache-reconfiguration based technique can save laser power by 23.8% and improves system EDP by 5.52% on average. In addition, we propose a methodology for placing and sharing on-chip laser sources by jointly considering the bandwidth requirements, thermal constraints and physical layout constraints. Our proposed methodology for placing and sharing of on-chip laser sources reduces laser power. In addition to reducing the laser power to improve the energy efficiency of silicon-photonic NoCs, we propose to leverage the large bandwidth provided by silicon-photonic NoC to share computing resources. The global sharing of floating-point units can save system area by 13.75% and system power by 10%
Providing quality of service over high speed electronic and optical switches
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (leaves 235-239).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.In a network, multiple links are interconnected by means of switches. A switch is a device with multiple input and output links, and its job is to move data from the input links to the output links. In this thesis, we focus on a number of fundamental issues concerning the quality of service provided by electronic and optical switches. We discuss various mechanisms that enable the support of quality of service requirements. In particular, we explore fundamental limitations of current high speed packet switches and develop new techniques and architectures that make possible the provision of certain service guarantees. We then study optical wavelength switches and illustrate how similar ideas can be applied in a manner consistent with the current state of optical switching technology. First, we focus on providing rate guarantees over packet switches. We develop a method called rate quantization which converts the set of desired rates into a certain discrete set such that the quality of service guarantees can be greatly improved with a small resource speedup. Moreover, quantization simplifies rate provisioning for dynamically changing traffic demands since it allows service opportunities for different input output link pairs to be scheduled with minimal dependence. We illustrate an isomorphism between packet switch schedulers and Clos networks to develop such schedulers.(cont.) Next, we evaluate the amount of resource speedup necessary for single stage switches to support multicast rates. This speedup limits the scalability of a single stage multicast switch a great deal. We present an in depth study of multistage switches and propose a number of architectures, along with associated routing and scheduling algorithms. We illustrate how the presence of multiple paths between input output pairs can be exploited to improve the performance of a switch and simplify the scheduling algorithms. Some of our architectures are capable of providing multicast rate guarantees without a need for a resource speedup. We extend our results on switch schedulers and use them for providing service guarantees over optical wavelength switches. We will take the limitations of the optical crossconnects and unavailability of optical memory technology into account, and modify the procedure we developed for electronic switches to make them suitable for various optical wavelength switches. These results will provide understanding of when to move optical switching closer to the end users for an efficient utilization of resources in networks with both optical and electronic technologies.by Can Emre Koksal.Ph.D
Upper Bound Analysis and Routing in Optical Benes Networks
Multistage Interconnection Networks (MIN) are popular in switching and communication applications. It has been used in telecommunication and parallel computing systems for many years. The new challenge facing optical MIN is crosstalk, which is caused by coupling two signals within a switching element. Crosstalk is not too big an issue in the Electrical Domain, but due to the stringent Bit Error Rate (BER) constraint, it is a big major concern in the Optical Domain. In this research dissertation, we will study the blocking probability in the optical network and we will study the deterministic conditions for strictly non-blocking Vertical Stacked Optical Benes Networks (VSOBN) with and without worst-case scenarios. We will establish the upper bound on blocking probability of Vertical Stacked Optical Benes Networks with respect to the number of planes used when the non-blocking requirement is not met. We will then study routing in WDM Benes networks and propose a new routing algorithm so that the number of wavelengths can be reduced. Since routing in WDM optical network is an NP-hard problem, many heuristic algorithms are designed by many researchers to perform this routing. We will also develop a genetic algorithm, simulated annealing algorithm and ant colony technique and apply these AI algorithms to route the connections in WDM Benes network
Spatial parallelism in the routers of asynchronous on-chip networks
State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Spatial parallelism in the routers of asynchronous on-chip networks
State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Reducing the Cost of Operating a Datacenter Network
Datacenters are a significant capital expense for many enterprises. Yet, they are difficult to manage and are hard to design and maintain. The initial design of a datacenter network tends to follow vendor guidelines, but subsequent upgrades and expansions to it are mostly ad hoc, with equipment being upgraded piecemeal after its amortization period runs out and equipment acquisition is tied to budget cycles rather than changes in workload.
These networks are also brittle and inflexible. They tend to be manually managed, and cannot perform dynamic traffic engineering.
The high-level goal of this dissertation is to reduce the total cost of owning a datacenter by improving its network. To achieve this, we make the following contributions. First, we develop an automated, theoretically well-founded approach to planning cost-effective datacenter upgrades and expansions. Second, we propose a scalable traffic management framework for datacenter networks. Together, we show that these contributions can significantly reduce the cost of operating a datacenter network.
To design cost-effective network topologies, especially as the network expands over time, updated equipment must coexist with legacy equipment, which makes the network heterogeneous. However, heterogeneous high-performance network designs are not well understood. Our first step, therefore, is to develop the theory of heterogeneous Clos topologies. Using our theory, we propose an optimization framework, called LEGUP, which designs a heterogeneous Clos network to implement in a new or legacy datacenter. Although effective, LEGUP imposes a certain amount of structure on the network. To deal with situations when this is infeasible, our second contribution is a framework, called REWIRE, which using optimization to design unstructured DCN topologies. Our results indicate that these unstructured topologies have up to 100-500\% more bisection bandwidth than a fat-tree for the same dollar cost.
Our third contribution is two frameworks for datacenter network traffic engineering. Because of the multiplicity of end-to-end paths in DCN fabrics, such as Clos networks and the topologies designed by REWIRE, careful traffic engineering is needed to maximize throughput. This requires timely detection of elephant flows---flows that carry large amount of data---and management of those flows. Previously proposed approaches incur high monitoring overheads, consume significant switch resources, or have long detection times.
We make two proposals for elephant flow detection. First, in the Mahout framework, we suggest that such flows be detected by observing the end hosts' socket buffers, which provide efficient visibility of flow behavior. Second, in the DevoFlow framework, we add efficient stats-collection mechanisms to network switches. Using simulations and experiments, we show that these frameworks reduce traffic engineering overheads by at least an order of magnitude while still providing near-optimal performance
Recommended from our members
Silicon photonic switching: from building block design to intelligent control
The rapid growth in data communication technologies is at the heart of enriching the digital experiences for people around the world. Encoding high bandwidth data to the optical domain has drastically changed the bandwidth-distance trade-off imposed by electrical media. Silicon photonics, sharing the technological maturity of the semiconductor industry, is a platform poised to make optical interconnect components more robust, manufacturable, and ubiquitous. One of the most prominent device classes enabled by the silicon photonics platform is photonic switching, which describes the direct routing of optical signal carriers without the optical-electrical-optical conversions. While theoretical designs and prototypes of monolithic silicon photonic switch devices have been studied, realizing high-performance and feasible switch systems requires explorations of all design aspects from basic building blocks to control systems. This thesis provides a holistic collection of studies on silicon photonic switching in topics of novel switching element designs, multi-stage switch architectures, device calibration, topology scalability, smart routing strategies, and performance-aware control plane.
First, component designs for assembling a silicon photonic switch device are presented. Structures that perform 2×2 optical switching functions are introduced. To realize switching granularities in both spatial and spectral domains, a resonator-assisted Mach-Zehnder interferometer design is demonstrated with high performance and design robustness. Next, multi-stage monolithic switching devices with microring resonator-based switching elements are investigated. An 8×8 switch device with dual-microring switching elements is presented with a well-balanced set of performance metrics in extinction ratio, crosstalk suppression, and optical bandwidth. Continued scaling in the switch port count requires both an economic increase in the number of switching elements integrated in a device and the preservation of signal quality through the switch fabric. A highly scalable switch architecture based on Clos network with microring switch-and-select sub-switches is presented as a solution to reach high switch radices while addressing key factors of insertion loss, crosstalk, and optical passband to ensure end-to-end switching performance.
The thesis then explores calibration techniques to acquire and optimize system-wide control points for integrated silicon switch devices. Applicable to common rearrangeably non-blocking switch topologies, automated procedures are developed to calibrate entire switch devices without the need for built-in power monitors. Using Mach-Zehnder interferometer-based switching elements as a demonstration, calibration techniques for optimal control points are introduced to achieve balanced push-pull drive scheme and reduced crosstalk in switching operations. Furthermore, smart routing strategies are developed based on optical penalty estimations enabled by expedited lightpath characterization procedures. Leveraging configuration redundancies in the switch fabric, the routing strategies are capable of avoiding the worst penalty optical paths and effectively elevate the bottom-line performance of the switch device.
Additional works are also presented on enhancing optical system control planes with machine learning techniques to accurately characterize complex systems and identify critical control parameters. Using flexgrid networks as a case study, light-weight machine learning workflows are tailored to devise control strategies for improving spectral power stability during wavelength assignment and defragmentation. This work affirms the efficacy of intelligent control planes to predict system dynamics and drive performance optimizations for optical interconnect systems
On-board B-ISDN fast packet switching architectures. Phase 1: Study
The broadband integrate services digital network (B-ISDN) is an emerging telecommunications technology that will meet most of the telecommunications networking needs in the mid-1990's to early next century. The satellite-based system is well positioned for providing B-ISDN service with its inherent capabilities of point-to-multipoint and broadcast transmission, virtually unlimited connectivity between any two points within a beam coverage, short deployment time of communications facility, flexible and dynamic reallocation of space segment capacity, and distance insensitive cost. On-board processing satellites, particularly in a multiple spot beam environment, will provide enhanced connectivity, better performance, optimized access and transmission link design, and lower user service cost. The following are described: the user and network aspects of broadband services; the current development status in broadband services; various satellite network architectures including system design issues; and various fast packet switch architectures and their detail designs
- …