11 research outputs found

    Host and Network Optimizations for Performance Enhancement and Energy Efficiency in Data Center Networks

    Get PDF
    Modern data centers host hundreds of thousands of servers to achieve economies of scale. Such a huge number of servers create challenges for the data center network (DCN) to provide proportionally large bandwidth. In addition, the deployment of virtual machines (VMs) in data centers raises the requirements for efficient resource allocation and find-grained resource sharing. Further, the large number of servers and switches in the data center consume significant amounts of energy. Even though servers become more energy efficient with various energy saving techniques, DCN still accounts for 20% to 50% of the energy consumed by the entire data center. The objective of this dissertation is to enhance DCN performance as well as its energy efficiency by conducting optimizations on both host and network sides. First, as the DCN demands huge bisection bandwidth to interconnect all the servers, we propose a parallel packet switch (PPS) architecture that directly processes variable length packets without segmentation-and-reassembly (SAR). The proposed PPS achieves large bandwidth by combining switching capacities of multiple fabrics, and it further improves the switch throughput by avoiding padding bits in SAR. Second, since certain resource demands of the VM are bursty and demonstrate stochastic nature, to satisfy both deterministic and stochastic demands in VM placement, we propose the Max-Min Multidimensional Stochastic Bin Packing (M3SBP) algorithm. M3SBP calculates an equivalent deterministic value for the stochastic demands, and maximizes the minimum resource utilization ratio of each server. Third, to provide necessary traffic isolation for VMs that share the same physical network adapter, we propose the Flow-level Bandwidth Provisioning (FBP) algorithm. By reducing the flow scheduling problem to multiple stages of packet queuing problems, FBP guarantees the provisioned bandwidth and delay performance for each flow. Finally, while DCNs are typically provisioned with full bisection bandwidth, DCN traffic demonstrates fluctuating patterns, we propose a joint host-network optimization scheme to enhance the energy efficiency of DCNs during off-peak traffic hours. The proposed scheme utilizes a unified representation method that converts the VM placement problem to a routing problem and employs depth-first and best-fit search to find efficient paths for flows

    Cross-Layer Design for Energy Efficiency on Data Center Network

    Get PDF
    Energy efficient infrastructures or green IT (Information Technology) has recently become a hot button issue for most corporations as they strive to eliminate every inefficiency from their enterprise IT systems and save capital and operational costs. Vendors of IT equipment now compete on the power efficiency of their devices, and as a result, many of the new equipment models are indeed more energy efficient. Various studies have estimated the annual electricity consumed by networking devices in the U.S. in the range of 6 - 20 Terra Watt hours. Our research has the potential to make promising solutions solve those overuses of electricity. An energy-efficient data center network architecture which can lower the energy consumption is highly desirable. First of all, we propose a fair bandwidth allocation algorithm which adopts the max-min fairness principle to decrease power consumption on packet switch fabric interconnects. Specifically, we include power aware computing factor as high power dissipation in switches which is fast turning into a key problem, owing to increasing line speeds and decreasing chip sizes. This efficient algorithm could not only reduce the convergence iterations but also lower processing power utilization on switch fabric interconnects. Secondly, we study the deployment strategy of multicast switches in hybrid mode in energy-aware data center network: a case of famous Fat-tree topology. The objective is to find the best location to deploy multicast switch not only to achieve optimal bandwidth utilization but also minimize power consumption. We show that it is possible to easily achieve nearly 50% of energy consumption after applying our proposed algorithm. Finally, although there exists a number of energy optimization solutions for DCNs, they consider only either the hosts or network, but not both. We propose a joint optimization scheme that simultaneously optimizes virtual machine (VM) placement and network flow routing to maximize energy savings. The simulation results fully demonstrate that our design outperforms existing host- or network-only optimization solutions, and well approximates the ideal but NP-complete linear program. To sum up, this study could be crucial for guiding future eco-friendly data center network that deploy our algorithm on four major layers (with reference to OSI seven layers) which are physical, data link, network and application layer to benefit power consumption in green data center

    Mesh-of-Trees Interconnection Network for an Explicitly Multi-Threaded Parallel Computer Architecture

    Get PDF
    As the multiple-decade long increase in clock rates starts to slow down, main-stream general-purpose processors evolve towards single-chip parallel processing. On-chip interconnection networks are essential components of such machines, supporting the communication between processors and the memory system. This task is especially challenging for some easy-to-program parallel computers, which are designed with performance-demanding memory systems. This study proposes an interconnection network, with a novel implementation of the Mesh-of-Trees (MoT) topology. The MoT network is evaluated relative to metrics such as wire area complexity, total register count, bandwidth, network diameter, single switch delay, maximum throughput per area, trade-offs between throughput and latency, and post-layout performance. It is also compared with some other traditional network topologies, such as mesh, ring, hypercube, butterfly, fat trees, butterfly fat trees, and replicated butterfly networks. Concrete results show that MoT provides higher throughput and lower latency especially when the input traffic (or the on-chip parallelism) is high, at comparable area cost. The layout of MoT network is evaluated using standard cell design methodology. A prototype chip with 8-terminal MoT network was taped out at 90nm90nm technology and tested. In the context of an easy-to-program single-chip parallel processor, MoT network is embedded in the eXplicit Multi-Threading (XMT) architecture, and evaluated by running parallel applications. In addition to the basic MoT architecture, a novel hybrid extension of MoT is proposed, which allows significant area savings with a small reduction in throughput

    Efficient Q. S support for higt-performance interconnects

    Get PDF
    Las redes de interconexión son un componente clave en un gran número de sistemas. Los mecanismos de calidad de servicio (qos) son responsables de asegurar que se alcanza un cierto rendimiento en la red. Las soluciones tradicionales para ofrecer qos en redes de interconexión de altas prestaciones normalmente se basan en arquitecturas complejas. El principal objetivo de esta tesis es investigar si podemos ofrecer mecanismos eficientes de qos. Nuestro propósito es alcanzar un soporte completo de qos con el mínimo de recursos. Para ello, se identifican redundancias en los mecanismos propuestos de qos y son eliminados sin afectar al rendimiento. Esta tesis consta de tres partes. En la primera comenzamos con las propuestas tradicionales de qos a nivel de clase de tráfico. En la segunda parte, proponemos como adaptar los mecanismos de qos basados en deadlines para redes de interconexión de altas prestaciones. Por último, también investigamos la interacción de los mecanismos de qos con el control de congestión

    Packet switch architecture for efficient unicast and multicast traffic switching

    Get PDF
    У дисертацији је предложена једноставна архитектура свича као и алгоритми за ефикасно распоређивање и комутацију уникаст и мултикаст саобраћаја, што је од великог значаја за савремене телекомуникационе мреже у којима количина саобраћаја константно расте. Први дио доприноса ове дисертације чини приједлог рјешења свича за ефикасно управљање уникаст саобраћајем. Ово рјешење је развијено комбинујући најбоље особине постојећих рјешења, при том избјегавајући одређене њихове недостатке. Циљ је да се омогући што брже прослијеђивање пакета уз прихватљив ниво хардверске комплексности. Свич који је развијен у овој дисертацији представља комбинацију свичева са баферима на улазу и свичева који користе Биркхоф-фон Нојман принцип детерминистичког конфигурисања комутационог модула па се не захтијева прорачун конфигурација комутатора. При томе, за разлику од већине рјешења која користе Биркхоф-фон Нојман принцип конфигурисања, у предложеном рјешењу могуће је користити само један физички комутациони модул који би обављао функције оба логичка комутациона модула. Да би се гарантовало да није дошло до поремећаја редослиједа пакета, предложен је и једноставан алгоритам за одабир пакета за слање. Такође, дат је и приједлог унапријеђења подршке за фер сервис првобитно предложеног рјешења за комутацију уникаст саобраћаја. У другом дијелу дисертације, пажња је посвећена унапријеђењу предложеног рјешења за ефикасно управљање и мултикаст саобраћајем. Потреба за овим се јавила као посљедица развоја нових сервиса (нпр. IPTV, онлајн игре итд.) који генеришу такав тип саобраћаја. Како је удио мултикаст саобраћаја у мрежи постао незанемарљив, перформансе свичева који су развијени примарно за уникаст саобраћај значајно опадају. Рјешење које је предложено у првом дијелу дисертације је унапријеђено додавањем модула који служи за управљање мултикаст саобраћајем. Овдје је идеја да се оптерећење са улазног порта који прима мултикаст пакете распореди на више портова који треба да приме те пакете. Овако је на релативно једноставан начин омогућено ефикасно управљање мултикаст саобраћајем. У оквиру дисертације су урађене софтверске симулације које су показале да ова рјешења постижу врло добре перформансе у односу на постојећа. Такође, урађена је и хардверска имплементација предложеног основног уникаст рјешења која је показала релативно скромне захтјеве у погледу хардверских ресурса.The dissertation proposes a simple switch architecture as well as algorithms for efficient scheduling and switching of unicast and multicast traffic, which is of great importance for modern telecommunication networks because their traffic load is constantly and rapidly increasing. The first part of the dissertation’s contributions comprises a proposed switch which efficiently manages unicast traffic. The proposed switch is developed by using the best characteristics of the existing solutions while avoiding some of their drawbacks. The aim is to enable fast packet forwarding while achieving an acceptable level of hardware complexity. The proposed solution combines architecture with buffers at input ports and Birkhoff-von Neumann architecture based on deterministic switch module configurations. Hence, calculation of switch module configurations is not needed. Also, folded architecture is possible, which means that only one physical switching module is used for both switching stages of Birkhoff-von Neumann architecture. A simple algorithm for packet scheduling has been developed in order to avoid packet out-of-sequence problems. Finally, fair service support improvement is introduced for the originally proposed switch solution. The second part of the dissertation is devoted to the enhancement of the proposed unicast switch for efficient management of multicast traffic. The need for multicast support has emerged as a consequence of the development and introduction of new services (such as IPTV, online gaming, etc.) that generate multicast traffic. As the amount of multicast traffic is not negligible anymore, the performance of packet switches that were primarily developed for the unicast traffic is significantly degraded. The solution proposed in the first part of the diseration is enhanced with the module used for multicast traffic management. Here, the idea is that the multicast load at some input port is distributed over ports that are also destination for the multicast packets. This approach enables relatively simple but efficient management of multicast traffic. In this dissertation, software simulations were conducted, which confirmed that proposed solutions achieve very good performances compared to existing solutons. Furthermore, hardware implementation of the proposed basic unicast switch solution shows modest requirements in terms of needed hardware resources

    On-chip networks for manycore architecture

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 109-116).Over the past decade, increasing the number of cores on a single processor has successfully enabled continued improvements of computer performance. Further scaling these designs to tens and hundreds of cores, however, still presents a number of hard problems, such as scalability, power efficiency and effective programming models. A key component of manycore systems is the on-chip network, which faces increasing efficiency demands as the number of cores grows. In this thesis, we present three techniques for improving the efficiency of on-chip interconnects. First, we present PROM (Path-based, Randomized, Oblivious, and Minimal routing) and BAN (Bandwidth Adaptive Networks), techniques that offer efficient intercore communication for bandwith-constrained networks. Next, we present ENC (Exclusive Native Context), the first deadlock-free, fine-grained thread migration protocol developed for on-chip networks. ENC demonstrates that a simple and elegant technique in the on-chip network can provide critical functional support for higher-level application and system layers. Finally, we provide a realistic context by sharing our hands-on experience in the physical implementation of the on-chip network for the Execution Migration Machine, an ENC-based 110-core processor fabricated in 45nm ASIC technology.by Myong Hyon Cho.Ph.D

    Hardware Support for Efficient Packet Processing

    Full text link
    Scalability is the key ingredient to further increase the performance of today’s supercomputers. As other approaches like frequency scaling reach their limits, parallelization is the only feasible way to further improve the performance. The time required for communication needs to be kept as small as possible to increase the scalability, in order to be able to further parallelize such systems. In the first part of this thesis ways to reduce the inflicted latency in packet based interconnection networks are analyzed and several new architectural solutions are proposed to solve these issues. These solutions have been tested and proven in a field programmable gate array (FPGA) environment. In addition, a hardware (HW) structure is presented that enables low latency packet processing for financial markets. The second part and the main contribution of this thesis is the newly designed crossbar architecture. It introduces a novel way to integrate the ability to multicast in a crossbar design. Furthermore, an efficient implementation of adaptive routing to reduce the congestion vulnerability in packet based interconnection networks is shown. The low latency of the design is demonstrated through simulation and its scalability is proven with synthesis results. The third part concentrates on the improvements and modifications made to EXTOLL, a high performance interconnection network specifically designed for low latency and high throughput applications. Contributions are modules enabling an efficient integration of multiple host interfaces as well as the integration of the on-chip interconnect. Additionally, some of the already existing functionality has been revised and improved to reach better performance and a lower latency. Micro-benchmark results are presented to underline the contribution of the made modifications

    Floorplan-Aware High Performance NoC Design

    Full text link
    Las actuales arquitecturas de m�ltiples n�cleos como los chip multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) han adoptado a las redes dentro del chip (NoC) como elemento -ptimo para la inter-conexi-n de los diversos elementos de dichos sistemas. En este sentido, fabricantes de CMPs y MPSoCs han adoptado NoCs sencillas, generalmente con una topolog'a en malla o anillo, ya que son suficientes para satisfacer las necesidades de los sistemas actuales. Sin embargo a medida que los requerimientos del sistema -- baja latencia y alto rendimiento -- se hacen m�s exigentes, estas redes tan simples dejan de ser una soluci-n real. As', la comunidad investigadora ha propuesto y analizado NoCs m�s complejas. No obstante, estas soluciones son m�s dif'ciles de implementar -- especialmente los enlaces largos -- haciendo que este tipo de topolog'as complejas sean demasiado costosas o incluso inviables. En esta tesis, presentamos una metodolog'a de dise-o que minimiza la p�rdida de prestaciones de la red debido a su implementaci-n real. Los principales problemas que se encuentran al implementar una NoC son los conmutadores y los enlaces largos. En esta tesis, el conmutador se ha hecho modular, es decir, formado como uni-n de m-dulos m�s peque-os. En nuestro caso, los m-dulos son id�nticos, donde cada m-dulo es capaz de arbitrar, conmutar, y almacenar los mensajes que le llegan. Posteriormente, flexibilizamos la colocaci-n de estos m-dulos en el chip, permitiendo que m-dulos de un mismo conmutador est�n distribuidos por el chip. Esta metodolog'a de dise-o la hemos aplicado a diferentes escenarios. Primeramente, hemos introducido nuestro conmutador modular en NoCs con topolog'as conocidas como la malla 2D. Los resultados muestran como la modularidad y la distribuci-n del conmutador reducen la latencia y el consumo de potencia de la red. En segundo lugar, hemos utilizado nuestra metodolog'a de dise-o para implementar un crossbar distribuidRoca Pérez, A. (2012). Floorplan-Aware High Performance NoC Design [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17844Palanci

    Reform of Building Regulation

    Get PDF
    The Productivity Commission's final research report, released December 2004, responds to a request by the Australian Government to examine the contribution that national reform of building regulation has made and further reform could make to the performance of the building and construction industry. The Commission found that the Australian Building Codes Board has made progress in reducing regulatory differences across jurisdictions and in basing the Building Code of Australia to performance-based requirements. However, there is scope for further reforms to enhance productivity and to benefit the broader community. The Commission recommends the Australian Government, as well as the State and Territory Governments, continue to be actively involved in reform of building regulation and to negotiate a new Intergovernmental Agreement. The agreement would clarify the objectives of building regulation reform; strengthen the commitment to national consistency; and also affirm the importance of a whole-of-government approach to building regulation.Australia; Commissioned study; Australian Building Codes Board (ABCB); Building; Construction; Economics; Inter-Government Agreement; Policy; Reform; Regulation;
    corecore