143 research outputs found
Optimized Contract-based Model for Resource Allocation in Federated Geo-distributed Clouds
In the era of Big Data, with data growing massively in scale and velocity, cloud computing and its pay-as-you-go modelcontinues to provide significant cost benefits and a seamless service delivery model for cloud consumers. The evolution of small-scaleand large-scale geo-distributed datacenters operated and managed by individual Cloud Service Providers (CSPs) raises newchallenges in terms of effective global resource sharing and management of autonomously-controlled individual datacenter resourcestowards a globally efficient resource allocation model. Earlier solutions for geo-distributed clouds have focused primarily on achievingglobal efficiency in resource sharing, that although tries to maximize the global resource allocation, results in significant inefficiencies inlocal resource allocation for individual datacenters and individual cloud provi ders leading to unfairness in their revenue and profitearned. In this paper, we propose a new contracts-based resource sharing model for federated geo-distributed clouds that allows CSPsto establish resource sharing contracts with individual datacentersapriorifor defined time intervals during a 24 hour time period. Based on the established contracts, individual CSPs employ a contracts cost and duration aware job scheduling and provisioning algorithm that enables jobs to complete and meet their response time requirements while achieving both global resource allocation efficiency and local fairness in the profit earned. The proposed techniques are evaluated through extensive experiments using realistic workloads generated using the SHARCNET cluster trace. The experiments demonstrate the effectiveness, scalability and resource sharing fairness of the proposed model
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
FedZero: Leveraging Renewable Excess Energy in Federated Learning
Federated Learning (FL) is an emerging machine learning technique that
enables distributed model training across data silos or edge devices without
data sharing. Yet, FL inevitably introduces inefficiencies compared to
centralized model training, which will further increase the already high energy
usage and associated carbon emissions of machine learning in the future.
Although the scheduling of workloads based on the availability of low-carbon
energy has received considerable attention in recent years, it has not yet been
investigated in the context of FL. However, FL is a highly promising use case
for carbon-aware computing, as training jobs constitute of energy-intensive
batch processes scheduled in geo-distributed environments.
We propose FedZero, a FL system that operates exclusively on renewable excess
energy and spare capacity of compute infrastructure to effectively reduce the
training's operational carbon emissions to zero. Based on energy and load
forecasts, FedZero leverages the spatio-temporal availability of excess energy
by cherry-picking clients for fast convergence and fair participation. Our
evaluation, based on real solar and load traces, shows that FedZero converges
considerably faster under the mentioned constraints than state-of-the-art
approaches, is highly scalable, and is robust against forecasting errors
Carbon Responder: Coordinating Demand Response for the Datacenter Fleet
The increasing integration of renewable energy sources results in
fluctuations in carbon intensity throughout the day. To mitigate their carbon
footprint, datacenters can implement demand response (DR) by adjusting their
load based on grid signals. However, this presents challenges for private
datacenters with diverse workloads and services. One of the key challenges is
efficiently and fairly allocating power curtailment across different workloads.
In response to these challenges, we propose the Carbon Responder framework.
The Carbon Responder framework aims to reduce the carbon footprint of
heterogeneous workloads in datacenters by modulating their power usage. Unlike
previous studies, Carbon Responder considers both online and batch workloads
with different service level objectives and develops accurate performance
models to achieve performance-aware power allocation. The framework supports
three alternative policies: Efficient DR, Fair and Centralized DR, and Fair and
Decentralized DR. We evaluate Carbon Responder polices using production
workload traces from a private hyperscale datacenter. Our experimental results
demonstrate that the efficient Carbon Responder policy reduces the carbon
footprint by around 2x as much compared to baseline approaches adapted from
existing methods. The fair Carbon Responder policies distribute the performance
penalties and carbon reduction responsibility fairly among workloads
- …