132 research outputs found

    Fourteenth Biennial Status Report: März 2017 - February 2019

    No full text

    Enhanced connectivity in wireless mobile programmable networks

    Get PDF
    Mención Interancional en el título de doctorThe architecture of current operator infrastructures is being challenged by the non-stop growing demand of data hungry services appearing every day. While currently deployed operator networks have been able to cope with traffic demands so far, the architectures for the 5th generation of mobile networks (5G) are expected to support unprecedented traffic loads while decreasing costs associated with the network deployment and operations. Indeed, the forthcoming set of 5G standards will bring programmability and flexibility to levels never seen before. This has required introducing changes in the architecture of mobile networks, enabling different features such as the split of control and data planes, as required to support rapid programming of heterogeneous data planes. Network softwarisation is hence seen as a key enabler to cope with such network evolution, as it permits controlling all networking functions through (re)programming, thus providing higher flexibility to meet heterogeneous requirements while keeping deployment and operational costs low. A great diversity in terms of traffic patterns, multi-tenancy, heterogeneous and stringent traffic requirements is therefore expected in 5G networks. Software Defined Networking (SDN) and Network Function Virtualisation (NFV) have emerged as a basic tool-set for operators to manage their infrastructure with increased flexibility and reduced costs. As a result, new 5G services can now be envisioned and quickly programmed and provisioned in response to user and market necessities, imposing a paradigm shift in the services design. However, such flexibility requires the 5G transport network to undergo a profound transformation, evolving from a static connectivity substrate into a service-oriented infrastructure capable of accommodating the various 5G services, including Ultra-Reliable and Low Latency Communications (URLLC). Moreover, to achieve the desired flexibility and cost reduction, one promising approach is to leverage virtualisation technologies to dynamically host contents, services, and applications closer to the users so as to offload the core network and reduce the communication delay. This thesis tackles the above challengeswhicharedetailedinthefollowing. A common characteristic of the 5G servicesistheubiquityandthealmostpermanent connection that is required from the mobile network. This really imposes a challenge in thesignallingproceduresprovidedtogettrack of the users and to guarantee session continuity. The mobility management mechanisms will hence play a central role in the 5G networks because of the always-on connectivity demand. Distributed Mobility Management (DMM) helps going towards this direction, by flattening the network, hence improving its scalability,andenablinglocalaccesstotheInternet and other communication services, like mobile-edge clouds. Simultaneously, SDN opens up the possibility of running a multitude of intelligent and advanced applications for network optimisation purposes in a centralised network controller. The combination of DMM architectural principles with SDN management appears as a powerful tool for operators to cope with the management and data burden expected in 5G networks. To meet the future mobile user demand at a reduced cost, operators are also looking at solutions such as C-RAN and different functional splits to decrease the cost of deploying and maintaining cell sites. The increasing stress on mobile radio access performance in a context of declining revenues for operators is hence requiring the evolution of backhaul and fronthaul transport networks, which currently work decoupled. The heterogeneity of the nodes and transmisión technologies inter-connecting the fronthaul and backhaul segments makes the network quite complex, costly and inefficient to manage flexibly and dynamically. Indeed, the use of heterogeneous technologies forces operators to manage two physically separated networks, one for backhaul and one forfronthaul. In order to meet 5G requirements in a costeffective manner, a unified 5G transport network that unifies the data, control, and management planes is hence required. Such an integrated fronthaul/backhaul transport network, denoted as crosshaul, will hence carry both fronthaul and backhaul traffic operating over heterogeneous data plane technologies, which are software-controlled so as to adapt to the fluctuating capacity demand of the 5G air interfaces. Moreover, 5G transport networks will need to accommodate a wide spectrum of services on top of the same physical infrastructure. To that end, network slicing is seen as a suitable candidate for providing the necessary Quality of Service (QoS). Traffic differentiation is usually enforced at the border of the network in order to ensure a proper forwarding of the traffic according to its class through the backbone. With network slicing, the traffic may now traverse many slice edges where the traffic policy needs to be enforced, discriminated and ensured, according to the service and tenants needs. However, the very basic nature that makes this efficient management and operation possible in a flexible way – the logical centralisation – poses important challenges due to the lack of proper monitoring tools, suited for SDN-based architectures. In order to take timely and right decisions while operating a network, centralised intelligence applications need to be fed with a continuous stream of up-to-date network statistics. However, this is not feasible with current SDN solutions due to scalability and accuracy issues. Therefore, an adaptive telemetry system is required so as to support the diversity of 5G services and their stringent traffic requirements. The path towards 5G wireless networks alsopresentsacleartrendofcarryingoutcomputations close to end users. Indeed, pushing contents, applications, and network functios closer to end users is necessary to cope with thehugedatavolumeandlowlatencyrequired in future 5G networks. Edge and fog frameworks have emerged recently to address this challenge. Whilst the edge framework was more infrastructure-focused and more mobile operator-oriented, the fog was more pervasive and included any node (stationary or mobile), including terminal devices. By further utilising pervasive computational resources in proximity to users, edge and fog can be merged to construct a computing platform, which can also be used as a common stage for multiple radio access technologies (RATs) to share their information, hence opening a new dimension of multi-RAT integration.La arquitectura de las infraestructuras actuales de los operadores está siendo desafiada por la demanda creciente e incesante de servicios con un elevado consumo de datos que aparecen todos los días. Mientras que las redes de operadores implementadas actualmente han sido capaces de lidiar con las demandas de tráfico hasta ahora, se espera que las arquitecturas de la quinta generación de redes móviles (5G) soporten cargas de tráfico sin precedentes a la vez que disminuyen los costes asociados a la implementación y operaciones de la red. De hecho, el próximo conjunto de estándares 5G traerá la programabilidad y flexibilidad a niveles nunca antes vistos. Esto ha requerido la introducción de cambios en la arquitectura de las redes móviles, lo que permite diferentes funciones, como la división de los planos de control y de datos, según sea necesario para soportar una programación rápida de planos de datos heterogéneos. La softwarisación de red se considera una herramienta clave para hacer frente a dicha evolución de red, ya que proporciona la capacidad de controlar todas las funciones de red mediante (re)programación, proporcionando así una mayor flexibilidad para cumplir requisitos heterogéneos mientras se mantienen bajos los costes operativos y de implementación. Por lo tanto, se espera una gran diversidad en términos de patrones de tráfico, multi-tenancy, requisitos de tráfico heterogéneos y estrictos en las redes 5G. Software Defined Networking (SDN) y Network Function Virtualisation (NFV) se han convertido en un conjunto de herramientas básicas para que los operadores administren su infraestructura con mayor flexibilidad y menores costes. Como resultado, los nuevos servicios 5G ahora pueden planificarse, programarse y aprovisionarse rápidamente en respuesta a las necesidades de los usuarios y del mercado, imponiendo un cambio de paradigma en el diseño de los servicios. Sin embargo, dicha flexibilidad requiere que la red de transporte 5G experimente una transformación profunda, que evoluciona de un sustrato de conectividad estática a una infraestructura orientada a servicios capaz de acomodar los diversos servicios 5G, incluso Ultra-Reliable and Low Latency Communications (URLLC). Además, para lograr la flexibilidad y la reducción de costes deseadas, un enfoque prometedores aprovechar las tecnologías de virtualización para alojar dinámicamente los contenidos, servicios y aplicaciones más cerca de los usuarios para descargar la red central y reducir la latencia. Esta tesis aborda los desafíos anteriores que se detallan a continuación. Una característica común de los servicios 5G es la ubicuidad y la conexión casi permanente que se requiere para la red móvil. Esto impone un desafío en los procedimientos de señalización proporcionados para hacer un seguimiento de los usuarios y garantizar la continuidad de la sesión. Por lo tanto, los mecanismos de gestión de la movilidad desempeñarán un papel central en las redes 5G debido a la demanda de conectividad siempre activa. Distributed Mobility Management (DMM) ayuda a ir en esta dirección, al aplanar la red, lo que mejora su escalabilidad y permite el acceso local a Internet y a otros servicios de comunicaciones, como recursos en “nubes” situadas en el borde de la red móvil. Al mismo tiempo, SDN abre la posibilidad de ejecutar una multitud de aplicaciones inteligentes y avanzadas para optimizar la red en un controlador de red centralizado. La combinación de los principios arquitectónicos DMM con SDN aparece como una poderosa herramienta para que los operadores puedan hacer frente a la carga de administración y datos que se espera en las redes 5G. Para satisfacer la demanda futura de usuarios móviles a un coste reducido, los operadores también están buscando soluciones tales como C-RAN y diferentes divisiones funcionales para disminuir el coste de implementación y mantenimiento de emplazamientos celulares. El creciente estrés en el rendimiento del acceso a la radio móvil en un contexto de menores ingresos para los operadores requiere, por lo tanto, la evolución de las redes de transporte de backhaul y fronthaul, que actualmente funcionan disociadas. La heterogeneidad de los nodos y las tecnologías de transmisión que interconectan los segmentos de fronthaul y backhaul hacen que la red sea bastante compleja, costosa e ineficiente para gestionar de manera flexible y dinámica. De hecho, el uso de tecnologías heterogéneas obliga a los operadores a gestionar dos redes separadas físicamente, una para la red de backhaul y otra para el fronthaul. Para cumplir con los requisitos de 5G de manera rentable, se requiere una red de transporte única 5G que unifique los planos de control, datos y de gestión. Dicha red de transporte fronthaul/backhaul integrada, denominada “crosshaul”, transportará tráfico de fronthaul y backhaul operando sobre tecnologías heterogéneas de plano de datos, que están controladas por software para adaptarse a la demanda de capacidad fluctuante de las interfaces radio 5G. Además, las redes de transporte 5G necesitarán acomodar un amplio espectro de servicios sobre la misma infraestructura física y el network slicing se considera un candidato adecuado para proporcionar la calidad de servicio necesario. La diferenciación del tráfico generalmente se aplica en el borde de la red para garantizar un reenvío adecuado del tráfico según su clase a través de la red troncal. Con el networkslicing, el tráfico ahora puede atravesar muchos fronteras entre “network slices” donde la política de tráfico debe aplicarse, discriminarse y garantizarse, de acuerdo con las necesidades del servicio y de los usuarios. Sin embargo, el principio básico que hace posible esta gestión y operación eficientes de forma flexible – la centralización lógica – plantea importantes desafíos debido a la falta de herramientas de supervisión necesarias para las arquitecturas basadas en SDN. Para tomar decisiones oportunas y correctas mientras se opera una red, las aplicaciones de inteligencia centralizada necesitan alimentarse con un flujo continuo de estadísticas de red actualizadas. Sin embargo, esto no es factible con las soluciones SDN actuales debido a problemas de escalabilidad y falta de precisión. Por lo tanto, se requiere un sistema de telemetría adaptable para respaldar la diversidad de los servicios 5G y sus estrictos requisitos de tráfico. El camino hacia las redes inalámbricas 5G también presenta una tendencia clara de realizar acciones cerca de los usuarios finales. De hecho, acercar los contenidos, las aplicaciones y las funciones de red a los usuarios finales es necesario para hacer frente al enorme volumen de datos y la baja latencia requerida en las futuras redes 5G. Los paradigmas de “edge” y “fog” han surgido recientemente para abordar este desafío. Mientras que el edge está más centrado en la infraestructura y más orientado al operador móvil, el fog es más ubicuo e incluye cualquier nodo (fijo o móvil), incluidos los dispositivos finales. Al utilizar recursos de computación de propósito general en las proximidades de los usuarios, el edge y el fog pueden combinarse para construir una plataforma de computación, que también se puede utilizar para compartir información entre múltiples tecnologías de acceso radio (RAT) y, por lo tanto, abre una nueva dimensión de la integración multi-RAT.Programa Oficial de Doctorado en Ingeniería TelemáticaPresidente: Carla Fabiana Chiasserini.- Secretario: Vincenzo Mancuso.- Vocal: Diego Rafael López Garcí

    Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks

    Get PDF
    In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions

    Methods and Applications of Synthetic Data Generation

    Get PDF
    The advent of data mining and machine learning has highlighted the value of large and varied sources of data, while increasing the demand for synthetic data captures the structural and statistical characteristics of the original data without revealing personal or proprietary information contained in the original dataset. In this dissertation, we use examples from original research to show that, using appropriate models and input parameters, synthetic data that mimics the characteristics of real data can be generated with sufficient rate and quality to address the volume, structural complexity, and statistical variation requirements of research and development of digital information processing systems. First, we present a progression of research studies using a variety of tools to generate synthetic network traffic patterns, enabling us to observe relationships between network latency and communication pattern benchmarks at all levels of the network stack. We then present a framework for synthesizing large scale IoT data with complex structural characteristics in a scalable extraction and synthesis framework, and demonstrate the use of generated data in the benchmarking of IoT middleware. Finally, we detail research on synthetic image generation for deep learning models using 3D modeling. We find that synthetic images can be an effective technique for augmenting limited sets of real training data, and in use cases that benefit from incremental training or model specialization, we find that pretraining on synthetic images provided a usable base model for transfer learning

    Designing, Building, and Modeling Maneuverable Applications within Shared Computing Resources

    Get PDF
    Extending the military principle of maneuver into war-fighting domain of cyberspace, academic and military researchers have produced many theoretical and strategic works, though few have focused on researching actual applications and systems that apply this principle. We present our research in designing, building and modeling maneuverable applications in order to gain the system advantages of resource provisioning, application optimization, and cybersecurity improvement. We have coined the phrase “Maneuverable Applications” to be defined as distributed and parallel application that take advantage of the modification, relocation, addition or removal of computing resources, giving the perception of movement. Our work with maneuverable applications has been within shared computing resources, such as the Clemson University Palmetto cluster, where multiple users share access and time to a collection of inter-networked computers and servers. In this dissertation, we describe our implementation and analytic modeling of environments and systems to maneuver computational nodes, network capabilities, and security enhancements for overcoming challenges to a cyberspace platform. Specifically we describe our work to create a system to provision a big data computational resource within academic environments. We also present a computing testbed built to allow researchers to study network optimizations of data centers. We discuss our Petri Net model of an adaptable system, which increases its cybersecurity posture in the face of varying levels of threat from malicious actors. Lastly, we present work and investigation into integrating these technologies into a prototype resource manager for maneuverable applications and validating our model using this implementation

    Distributed String Sorting Algorithms

    Get PDF

    Techniques for High Performance Matching

    Get PDF
    With the growth of big data application demands, improving high-performance computing (HPC) becomes an essential industry task. High-performance matching is a critical performance path for HPC communications because it significantly impacts computing performance and profoundly affects networking performance. This dissertation focuses on improving the high-performance matching in HPC networks to keep up with the increasingly heavy demands of evolving applications. This dissertation is tackling the matching problem from both the computational and network aspects. On the one hand, the Message Passing Interface (MPI) is a de facto standard for the communication of parallel processes in an HPC network [1]. MPI has delivered an excellent performance for running large-scale scientific applications in petascale systems. Along with the petascale system, the exascale system is evolving to run even larger applications where the computing job size increases dramatically. This trend enlarges the message queues and degrades the MPI message matching performance. With the increasing requirement of big data applications, MPI message matching is a critical performance path for HPC communications. On the other hand, with the blooming of network techniques and the fast-growing size of network applications, users are seeking more enhanced, secure, and various network services. In an HPC network, the HPC cluster comprises multiple interconnected nodes in a switched network. With the integration of software-defined networking (SDN) technology into the HPC network, both the computational and network resources can be allocated efficiently according to the applications’ requirements. Thus, SDN switches are deployed in HPC networks to support high-performance, differentiated network services and guarantee the diverse users’ needs, such as firewall, load balancing, and quality of service [2]. In an SDN switch, packet classification classifies incoming packets to flows according to the rules generated in the control plane, which is a switch’s core function. Therefore, packet classification becomes a critical performance path for the HPC network. First, this dissertation presents GenMatcher, a generic and software-only arbitrary matching framework for fast and efficient searches on packet classification. The goal is to represent arbitrary rules with efficient prefix-based tries. In order to generate efficient trie groupings and expansions to support all arbitrary rules, we propose a clustering-based grouping algorithm to group rules based upon their bit-level similarities. Our algorithm generates near-optimal trie groupings with low configuration times and provides significantly higher match throughput than prior techniques. Experiments with synthetic traffic show that our method can achieve a 58.9X speedup 1 compared to the baseline on a single-core processor under a given memory constraint [3]. Second, to further improve the GenMatcher performance, this dissertation proposes GenS- Matcher, an efficient Single Instruction Multiple Data (SIMD) and cache-friendly arbitrary match- ing framework. GenSMatcher adopts a trie node with a fixed high-fanout and a varying span for each node depending on the data distribution. The layout of the trie node leverage cache and modern processor features such as SIMD instructions. To support arbitrary matching, we interpret arbitrary rules into three fields: value, mask, and priority, and then propose the GenSMatcher extraction algorithm to process the wildcard bits to support randomly positioning wildcards in arbitrary rules. At last, we add an array of wildcard entries to the leaf entries, which stores the wildcard rules and guarantees matching results. Experiments show that GenSMatcher outperforms GenMatcher under a large scale of the ruleset and key set regarding search time, insert time, and memory cost. Specifically, with 5M rules, our method achieves a 2.7X speedup on search time, and the insertion time takes ∼ 7.3 seconds, gaining a 1.38X speedup; meanwhile, the memory cost reduction is up to 6.17X. Third, to guarantee MPI ordering feature and high-performance matching for big applications on MPI tag matching, this dissertation introduces a new hybrid data structure and match- ing mechanism to address the performance challenges, reducing the matching operation time in the posted receive queue (PRQ) and unexpected message queue (UMQ). The hybrid data structures are composed of tries and hash maps. We evaluate our mechanism on microbenchmarks and existing MPI applications with different numbers of processes. Experiments with synthetic message flow show that our method can achieve a 20X search time speedup compared to the single-core processor’s baseline. For the PICSARlite application, we integrated our Hybrid and Intel mechanism into the MPICH library and evaluated their performance on the Ada cluster of Texas A&M University, which has 793 general compute nodes. The experiment outcome shows that our proposed Hybrid mechanism can achieve up to 1.55X speedup compared to the MPICH library method

    Cloud-efficient modelling and simulation of magnetic nano materials

    Get PDF
    Scientific simulations are rarely attempted in a cloud due to the substantial performance costs of virtualization. Considerable communication overheads, intolerable latencies, and inefficient hardware emulation are the main reasons why this emerging technology has not been fully exploited. On the other hand, the progress of computing infrastructure nowadays is strongly dependent on perspective storage medium development, where efficient micromagnetic simulations play a vital role in future memory design. This thesis addresses both these topics by merging micromagnetic simulations with the latest OpenStack cloud implementation while providing a time and costeffective alternative to expensive computing centers. However, many challenges have to be addressed before a high-performance cloud platform emerges as a solution for problems in micromagnetic research communities. First, the best solver candidate has to be selected and further improved, particularly in the parallelization and process communication domain. Second, a 3-level cloud communication hierarchy needs to be recognized and each segment adequately addressed. The required steps include breaking the VMisolation for the host’s shared memory activation, cloud network-stack tuning, optimization, and efficient communication hardware integration. The project work concludes with practical measurements and confirmation of successfully implemented simulation into an open-source cloud environment. It is achieved that the renewed Magpar solver runs for the first time in the OpenStack cloud by using ivshmem for shared memory communication. Also, extensive measurements proved the effectiveness of our solutions, yielding from sixty percent to over ten times better results than those achieved in the standard cloud.Aufgrund der erheblichen Leistungskosten der Virtualisierung werden wissenschaftliche Simulationen in einer Cloud selten versucht. Beträchtlicher Kommunikationsaufwand, erhebliche Latenzen und ineffiziente Hardwareemulation sind die Hauptgründe, warum diese aufkommende Technologie nicht vollständig genutzt wurde. Andererseits hängt der Fortschritt der Computertechnologie heutzutage stark von der Entwicklung perspektivischer Speichermedien ab, bei denen effiziente mikromagnetische Simulationen eine wichtige Rolle für die zukünftige Speichertechnologie spielen. Diese Arbeit befasst sich mit diesen beiden Themen, indem mikromagnetische Simulationen mit der neuesten OpenStack Cloud-Implementierung zusammengeführt werden, um eine zeit- und kostengünstige Alternative zu teuren Rechenzentren bereitzustellen. Viele Herausforderungen müssen jedoch angegangen werden, bevor eine leistungsstarke Cloud-Plattform als Lösung für Probleme in mikromagnetischen Forschungsgemeinschaften entsteht. Zunächst muss der beste Kandidat für die Lösung ausgewählt und weiter verbessert werden, insbesondere im Bereich der Parallelisierung und Prozesskommunikation. Zweitens muss eine 3-stufige CloudKommunikationshierarchie erkannt und jedes Segment angemessen adressiert werden. Die erforderlichen Schritte umfassen das Aufheben der VM-Isolation, um den gemeinsam genutzten Speicher zwischen Cloud-Instanzen zu aktivieren, die Optimierung des Cloud-Netzwerkstapels und die effiziente Integration von Kommunikationshardware. Die praktische Arbeit endet mit Messungen und der Bestätigung einer erfolgreich implementierten Simulation in einer Open-Source Cloud-Umgebung. Als Ergebnis haben wir erreicht, dass der neu erstellte Magpar-Solver zum ersten Mal in der OpenStack Cloud ausgeführt wird, indem ivshmem für die Shared-Memory Kommunikation verwendet wird. Umfangreiche Messungen haben auch die Wirksamkeit unserer Lösungen bewiesen und von sechzig Prozent bis zu zehnmal besseren Ergebnissen als in der Standard Cloud geführt

    Configurable data center switch architectures

    Get PDF
    In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces
    corecore