3,932 research outputs found
Brief Announcement: Efficient Load-Balancing Through Distributed Token Dropping
We introduce a new graph problem, the token dropping game, and we show how to solve it efficiently in a distributed setting. We use the token dropping game as a tool to design an efficient distributed algorithm for the stable orientation problem, which is a special case of the more general locally optimal semi-matching problem. The prior work by Czygrinow et al. (DISC 2012) finds a locally optimal semi-matching in O(??) rounds in graphs of maximum degree ?, which directly implies an algorithm with the same runtime for stable orientations. We improve the runtime to O(??) for stable orientations and prove a lower bound of ?(?) rounds
EVEREST IST - 2002 - 00185 : D23 : final report
Deliverable pĂşblic del projecte europeu EVERESTThis deliverable constitutes the final report of the project IST-2002-001858 EVEREST. After its successful completion, the project presents this document that firstly summarizes the context, goal and the approach objective of the project. Then it presents a concise summary of the major goals and results, as well as highlights the most valuable lessons derived form the project work. A list of deliverables and publications is included in the annex.Postprint (published version
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Mixture-of-Experts (MoE) models have gained popularity in achieving
state-of-the-art performance in a wide range of tasks in computer vision and
natural language processing. They effectively expand the model capacity while
incurring a minimal increase in computation cost during training. However,
deploying such models for inference is difficult due to their large size and
complex communication pattern. In this work, we provide a characterization of
two MoE workloads, namely Language Modeling (LM) and Machine Translation (MT)
and identify their sources of inefficiencies at deployment. We propose three
optimization techniques to mitigate sources of inefficiencies, namely (1)
Dynamic gating, (2) Expert Buffering, and (3) Expert load balancing. We show
that dynamic gating improves maximum throughput by 6.21-11.23 for LM,
5.75-10.98 for MT Encoder and 2.58-5.71 for MT Decoder. It also
reduces memory usage by up to 1.36 for LM and up to 1.1 for MT.
We further propose Expert Buffering, a new caching mechanism that only keeps
hot, active experts in GPU memory while buffering the rest in CPU memory. This
reduces static memory allocation by up to 1.47. We finally propose a
load balancing methodology that provides additional scalability to the
workload
Final report on the evaluation of RRM/CRRM algorithms
Deliverable public del projecte EVERESTThis deliverable provides a definition and a complete evaluation of the RRM/CRRM algorithms selected in D11 and D15, and evolved and refined on an iterative process. The evaluation will be carried out by means of simulations using the simulators provided at D07, and D14.Preprin
Traffic Management Applications for Stateful SDN Data Plane
The successful OpenFlow approach to Software Defined Networking (SDN) allows
network programmability through a central controller able to orchestrate a set
of dumb switches. However, the simple match/action abstraction of OpenFlow
switches constrains the evolution of the forwarding rules to be fully managed
by the controller. This can be particularly limiting for a number of
applications that are affected by the delay of the slow control path, like
traffic management applications. Some recent proposals are pushing toward an
evolution of the OpenFlow abstraction to enable the evolution of forwarding
policies directly in the data plane based on state machines and local events.
In this paper, we present two traffic management applications that exploit a
stateful data plane and their prototype implementation based on OpenState, an
OpenFlow evolution that we recently proposed.Comment: 6 pages, 9 figure
- …