3,932 research outputs found

    Brief Announcement: Efficient Load-Balancing Through Distributed Token Dropping

    Get PDF
    We introduce a new graph problem, the token dropping game, and we show how to solve it efficiently in a distributed setting. We use the token dropping game as a tool to design an efficient distributed algorithm for the stable orientation problem, which is a special case of the more general locally optimal semi-matching problem. The prior work by Czygrinow et al. (DISC 2012) finds a locally optimal semi-matching in O(??) rounds in graphs of maximum degree ?, which directly implies an algorithm with the same runtime for stable orientations. We improve the runtime to O(??) for stable orientations and prove a lower bound of ?(?) rounds

    EVEREST IST - 2002 - 00185 : D23 : final report

    Get PDF
    Deliverable pĂşblic del projecte europeu EVERESTThis deliverable constitutes the final report of the project IST-2002-001858 EVEREST. After its successful completion, the project presents this document that firstly summarizes the context, goal and the approach objective of the project. Then it presents a concise summary of the major goals and results, as well as highlights the most valuable lessons derived form the project work. A list of deliverables and publications is included in the annex.Postprint (published version

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

    Full text link
    Mixture-of-Experts (MoE) models have gained popularity in achieving state-of-the-art performance in a wide range of tasks in computer vision and natural language processing. They effectively expand the model capacity while incurring a minimal increase in computation cost during training. However, deploying such models for inference is difficult due to their large size and complex communication pattern. In this work, we provide a characterization of two MoE workloads, namely Language Modeling (LM) and Machine Translation (MT) and identify their sources of inefficiencies at deployment. We propose three optimization techniques to mitigate sources of inefficiencies, namely (1) Dynamic gating, (2) Expert Buffering, and (3) Expert load balancing. We show that dynamic gating improves maximum throughput by 6.21-11.23Ă—\times for LM, 5.75-10.98Ă—\times for MT Encoder and 2.58-5.71Ă—\times for MT Decoder. It also reduces memory usage by up to 1.36Ă—\times for LM and up to 1.1Ă—\times for MT. We further propose Expert Buffering, a new caching mechanism that only keeps hot, active experts in GPU memory while buffering the rest in CPU memory. This reduces static memory allocation by up to 1.47Ă—\times. We finally propose a load balancing methodology that provides additional scalability to the workload

    Final report on the evaluation of RRM/CRRM algorithms

    Get PDF
    Deliverable public del projecte EVERESTThis deliverable provides a definition and a complete evaluation of the RRM/CRRM algorithms selected in D11 and D15, and evolved and refined on an iterative process. The evaluation will be carried out by means of simulations using the simulators provided at D07, and D14.Preprin

    Traffic Management Applications for Stateful SDN Data Plane

    Get PDF
    The successful OpenFlow approach to Software Defined Networking (SDN) allows network programmability through a central controller able to orchestrate a set of dumb switches. However, the simple match/action abstraction of OpenFlow switches constrains the evolution of the forwarding rules to be fully managed by the controller. This can be particularly limiting for a number of applications that are affected by the delay of the slow control path, like traffic management applications. Some recent proposals are pushing toward an evolution of the OpenFlow abstraction to enable the evolution of forwarding policies directly in the data plane based on state machines and local events. In this paper, we present two traffic management applications that exploit a stateful data plane and their prototype implementation based on OpenState, an OpenFlow evolution that we recently proposed.Comment: 6 pages, 9 figure
    • …
    corecore