Optimizing the Energy Consumption of Servers and Networks in Cloud Data Centers by Liu, Jun
Optimizing the Energy Consumption of Servers and Networks
in Cloud Data Centers
by
Jun Liu
A dissertation submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
(Information Systems Engineering)
in the University of Michigan at Dearborn
2015
Doctoral Committee:
Associate Professor Jinhua Guo, Chair
Professor Kiumi Akingbehin
Associate Professor Di Ma
Associate Professor Weidong Xiang
©Jun Liu
2015
I would like to lovingly dedicate this dissertation to my Mother
and my daughter.
i
A C K N O W L E D G M E N T S
This dissertation would not have been successfully completed without the con-
tributions of many people. First of all, I would like to thank my thesis advisor,
Dr. Jinhua Guo, who has given me the opportunity to undertake a PhD and
provided me with invaluable guidance and advice throughout my PhD training.
I would like to thank Dr. Jinhua Guo for supporting me during these years. I
am also grateful to Dr. Di Ma for discussions and suggestions on related topics
that helped me improve my knowledge in the area. I would like to express my
gratitude to all my PhD committee members, Dr. Kiumi Akingbehin, Dr. Di
Ma, and Dr. Weidong Xiang, for their constructive comments and suggestions
on improving my work.
ii
TABLE OF CONTENTS
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Real-Time Task Scheduling Problem on Multi-core Processors with Voltage Islands 2
1.2 Virtual Network Mapping Problem in Data Centers . . . . . . . . . . . . . . . . 3
1.3 Research Problems and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Review of the State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Energy-Efficient hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Dynamic Voltage and Frequency Scaling . . . . . . . . . . . . . . . . . 10
2.1.2 Multiple Supply Voltage . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Dynamic Power Management . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Energy-Efficient Multi-Core Processor with Voltage Island Model . . . . . . . . 14
2.2.1 Multi-Core processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Voltage Island . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Energy-Efficient Task Scheduling in Multi-core processor . . . . . . . . 16
2.3 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.2 Bandwidth Guarantee . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.3 Pipe Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.4 Hose Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Data Center Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1 Data Center Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2 Energy-Efficient Data Center . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Preliminary Work: Power Saving Design for a Single Server . . . . . . . . . . . . . . 29
iii
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 The Design of the PowerSleep Scheme . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Power Consumption and Response Time Analysis . . . . . . . . . . . . . . . . . 33
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Voltage Island Aware Energy Efficient Scheduling of Real-Time Tasks on Multi-
Core Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 System Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1 Power Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.2 Periodic Real-Time Task Model . . . . . . . . . . . . . . . . . . . . . . 41
4.2.3 Multi-Core Processors with Voltage Islands . . . . . . . . . . . . . . . . 42
4.3 Problem Definition and An Approximation Algorithm . . . . . . . . . . . . . . . 42
4.3.1 Voltage Island Energy-Efficiency Scheduling (VIEES) . . . . . . . . . . 43
4.3.2 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.3 Voltage Island Largest Capacity First (VILCF) Algorithm . . . . . . . . 45
4.4 Lower Bound of Energy Consumption of the VIEES Problem . . . . . . . . . . . 46
4.4.1 Semi-VIEES Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4.2 Optimal Solution of Semi-VIEES Problem . . . . . . . . . . . . . . . . 50
4.5 Approximation Ratio of Algorithm VILCF . . . . . . . . . . . . . . . . . . . . 52
4.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5 Traffic and Energy Aware Virtual Network Mapping in Data Centers . . . . . . . . . 60
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.1 Network Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.2 VM Communication Bandwidth Allocation . . . . . . . . . . . . . . . . 62
5.2.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 The Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.1 Traffic-Aware VM Packing . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.2 Virtual Network Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3.3 VM-Pair Flow Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.1 VM Packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4.2 Virtual Network Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
iv
LIST OF FIGURES
2.1 Pipe Modle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Hose Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Fat-tree topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 VL2 topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Scenarios for PowerSleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Power consumption to speed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Optimal Schedule of the Semi-VIEES problem if
∑M−1
m=0
∑K−1
k=0 (l
max
m −l′m,k) ≤
∑
τi∈T\T f
ci
pi
,
M = 3, K = 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Optimal Schedule of the Semi-VIEES problem if
∑M−1
m=0
∑K−1
k=0 (l
max
m −l′m,k) >
∑
τi∈T\T f
ci
pi
,
M = 3, K = 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Upper bound of energy consumption of the VIEES problem, M = 3, K = 3. . . . . . 52
4.5 The energy consumption approximation to the number of blocks . . . . . . . . . . . . 56
4.6 Comparison of Normalized Energy between VILCF and LTF when the number of core
is 128, the number of blocks ranges in [21, 25], the task set ranges in [150, 550] . . . . 57
4.7 Power consumption approximation when η ranges from 1.2 to 4, stepped by 0.2. β is
set as 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.8 Power consumption approximation when β ranges from 0 to 5. η is set as 3. . . . . . 58
5.1 Three-tier fat-tree topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Graph of a tenant virtual network with 17 nodes while Cs = 12 and Cl = 24 . . . . . . 71
5.3 Merge VMs 13 and 14, VMs 15 and 16 respectively . . . . . . . . . . . . . . . . . . . 71
5.4 Maxium Spanning Tree of the graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 partition VMs 1, 2, 3, VMs 4, 5, 6, VMs 7, 8, 9, and then merge them into three super
VMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6 Second round merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7 Second round partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.8 The PM ratio when the number of VMs ranged from 50 to 600 stepped by 25, and the
mean degree of VMs was 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.9 The inter-PM traffic ratio when the number of VMs ranged from 50 to 600 stepped by
25, and the mean degree of VMs was 8. . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.10 The PM ratio when the mean degree of VMs ranged from 4 to 14 stepped by 1, and
the number of VMs was 200. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.11 The inter-PM traffic ratio when the mean degree of VMs ranged from 4 to 14 stepped
by 1, and the number of VMs was 200. . . . . . . . . . . . . . . . . . . . . . . . . . 82
v
5.12 The number of PMs when the number of VMs ranged from 50 to 600 stepped by 25,
and the mean degree of VMs was 8 in the log-normal distribution with σ = 1.5. . . . 82
5.13 The total inter-PM traffic volumes when the number of VMs ranged from 50 to 600
stepped by 25, and the mean degree of VMs was 8 in the log-normal distribution with
σ = 1.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.14 The number of PMs when the mean degree of VMs ranged from 4 to 14 stepped by 1
in the log-normal distribution with σ = 1.5, and the number of VMs was 200. . . . . . 83
5.15 The number of PMs when the mean degree of VMs ranged from 4 to 14 stepped by 1
in the log-normal distribution with σ = 1.5, and the number of VMs was 200. . . . . . 83
5.16 Simulation Results when data center network is fat-tree with K = 16, cut traffic of
Algorithm VN-Mapping to Random . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.17 Simulation Results when data center network is fat-tree with K = 16, cut traffic of
Algorithm VN-Mapping to Reverse cut . . . . . . . . . . . . . . . . . . . . . . . . . 84
vi
LIST OF TABLES
5.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
vii
ABSTRACT
Data center is a cost-effective infrastructure for storing large volumes of data and hosting large-
scale service applications. Cloud computing service providers are rapidly deploying data centers
across the world. with huge number of servers and switches. These data centers consume sig-
nificant amounts of energy, contributing to high operational costs. Thus, optimizing the energy
consumption of servers and networks in data centers can reduce operational costs.
In a data center, power consumption is mainly due to servers, networking devices, and cooling
systems, an effective energy saving strategy is to consolidate the computation and communication
into smaller number of servers and network devices and then power off as many unneeded servers
and network devices as possible.
In this thesis, we propose several novel methods to reduce the energy consumption of computer
systems and networks in data centers, while satisfying Quality of Service (QoS) requirements
specified by cloud tenants.
First, we study energy efficient scheduling of periodic real-time tasks on multi-core proces-
sors with voltage islands, in which cores are partitioned into multiple blocks (termed voltage is-
lands). We propose a Voltage Island Largest Capacity First (VILCF) algorithm for energy efficient
scheduling of periodic real-time tasks on multi-core processors. It achieves better energy efficiency
by fully utilizing the remaining capacity of an island before turning on more islands or increas-
ing the voltage level of the current active islands. We provide detailed theoretical analysis of the
approximation ratio of the proposed VILCF algorithm in terms of energy efficiency.
Second, we study the resource allocation problem for virtual networks in data centers. A
cloud tenant expresses computation requirement for each virtual machine (VM) and bandwidth
viii
requirement for each pair of VMs. A cloud provider places the VMs and routes the traffic among
the VMs in a way that minimizes the total number of servers and switches used while providing
both computation and bandwidth guarantees. The unneeded servers and switches are powered off
to conserve energy. We formulate a special Multi-Capacity Bin Packing problem that consolidates
VMs into the fewest number of servers. We present a weighted graph cut algorithm to map the
consolidated virtual network into a data center network that minimizes the number of links and
switches used.
ix
CHAPTER 1
Introduction
Data centers are cost-effective infrastructures for storing large volumes of data and hosting large-
scale service applications. Data centers contain hundreds of thousands of servers, interconnected
via switches, routers and high-speed links. Today, large companies such as Amazon, Google,
Facebook, and Yahoo! routinely use data centers for storage, web search, and large-scale com-
putations [1]. With the rise of cloud computing, service hosting in data centers has become a
multi-billion dollar business that plays a crucial role in the future Information Technology indus-
try.
However, a large-scale computing infrastructure consumes enormous amounts of electrical
power leading to very high operational costs that will exceed the cost of the infrastructure in a
few years. In 2013, U.S. data centers consumed an estimated 91 billion kilowatt-hours of electric-
ity, equivalent to the annual output of 34 large (500-megawatt) coal-fired power plants. The annual
electricity consumption of data centers is projected to increase to roughly 140 billion kilowatt-
hours by 2020, the equivalent annual output of 50 power plants, costing American businesses $13
billion annually in electricity bills and emitting nearly 100 million metric tons of carbon pollution
per year [2].
In a data center, power consumption is mainly due to servers, networking devices, and cooling
systems. There are two main approaches for reducing the energy consumption of data centers:
(a) shutting down devices or (b) scaling down performance. The former, commonly referred as
Dynamic Power Management (DPM), results in the greatest savings, since the average workload
1
often remains below 30% of its capacity in cloud computing systems [3]. The latter corresponds to
Dynamic Voltage and Frequency Scaling (DVFS) technology, which can adjust the performance of
the hardware and power consumption to match the corresponding characteristics of the workload.
Virtualization represents a key technology for efficient operation of cloud data centers. Data
center resources are often underutilized since the average load is about 30% of its capacity [3].
Energy consumption in virtualized data centers can be reduced by appropriate decision on which
physical server a virtual machine (VM) should be placed. Virtual machine consolidation strategies
try to use the fewest possible number of physical machines to host a certain number of virtual
machines. According to Open Compute project report [4], 93% of the energy consumption in a
data center depends upon efficient utilization of computing resources at data centers.
In this thesis, we propose several novel methods to reduce the energy consumption of computer
systems and networks in data centers, while satisfying Quality of Service (QoS) requirements
specified by cloud tenants.
1.1 Real-Time Task Scheduling Problem on Multi-core Proces-
sors with Voltage Islands
As the rate of the clock speed improvement of a single processor slowed, manufacturers shifted to
add more independent processors (known as cores) into a single chip, which is called Multi-core
Processor. Multi-core processors can achieve higher throughput with the same clock frequency. In
addition, the proximity of multiple cores on the same chip allows the cache coherency circuitry to
operate at a much higher frequency. However, power management becomes more challenging for
real-time systems using multi-core processors.
To reduce overall power consumption in multi-core processors, the use of multiple supply
voltages has been widely adopted. Often, this approach is realized through the use of core based
voltage-islands [5]. A voltage island is a group of on-chip cores powered by the same voltage
source, independently from the chip-level voltage supply. More voltage islands and fine-grained
2
control of shutdown mechanism lead to lower power consumption.
Power consumption of chips generally breaks down into two sources [6], dynamic power and
leakage power. The dynamic power of a core is a function of working frequency of the core.
The dynamic power increases as the core frequency increases. This function is a convex function.
There is a certain frequency where the energy efficiency of a core is optimal. The dynamic power
of a core can be reduced by executing at the low frequency using Dynamic Voltage and Frequency
Scaling (DVFS) technique [7,8]. In general, if the workloads across all cores are balanced, it would
allow each core work at relatively low frequency and thus consume less dynamic power.
Leakage power comes from the circuit leakage current. As the frequency of a core is regulated
to be slow, the dynamic power decreases, but the proportion of leakage power increases and leakage
power becomes the dominant factor of the power consumption of a core. To reduce leakage power
of a multi-core processor, when all cores in an island are idle, the island can be powered off using
dynamic power management (DPM) [9] .
As the demand for concurrent processing and increased energy efficiency grows, it is expected
that multi-core processors will become widely used in real-time systems. Energy-efficient task
scheduling with various deadline constraints has received a lot of attention. Real-time tasks are
partitioned into subsets, and these real-time subsets are assigned into cores respectively. Because
the dynamic power is a convex function of core working speed, the workloads across all cores must
be balanced so that each core work at relatively low frequency and thus consume less dynamic
power. However, it is very challenging to sche real-time tasks to cores so that the workload of each
core is balanced.
1.2 Virtual Network Mapping Problem in Data Centers
Since average server utilization in data centers is only 20%-30%, one method to improve the utiliza-
tion of resources and reduce energy consumption is to dynamically consolidate Virtual Machines
(VMs) into smaller number of physical machines using the virtualization technology. Virtualiza-
3
tion partitions available resources and share them among different tenants. Server virtualization
allows cloud providers to create multiple VM instances on a single physical server, thus improving
the utilization of servers. Server virtualization also allows VMs to migrate between servers to con-
solidate workloads and reduce the number of active servers in data centers. Network virtualization
aims at creating multiple virtual networks to improve network utilization. With such virtualization,
the resources can be scheduled with fine-granularity. The energy consumption can be reduced by
powering off idle serves and switches, thus eliminating the leakage power consumption.
In addition to energy efficiency, network performance in data centers is another important con-
cern. Cloud providers do not offer guaranteed network resources to tenants. The bandwidth
achieved by traffic flows between VMs of a tenant depends on a variety of factors outside the
control of the tenant, such as the network load and placement of VMs.
Currently, most cloud data centers offer only guarantees on computation (CPU, memory, and
storage) requirement for each VM, but networks are shared between tenants in a best-effort man-
ner. Consequently, tenants experience variable and unpredictable network performance [10]. But a
wide range of applications such as user-facing web applications , transaction processing web appli-
cations and MapReduce-like data intensive applications need predictable application performance.
Without guaranteed network performance, cloud is unable to effectively support various classes of
applications that rely on predictable performance.
To guarantee the network performance for tenants, Hetish proposed to abstract tenant require-
ments as a virtual network(VN). An effective network abstraction model [11] serves two purposes.
One purpose is for tenants to specify their network requirements in a simple and intuitive yet accu-
rate manner. The other purpose is to facilitate easy translation of these requirements to an efficient
deployment on the low level infrastructure components.
There are two popular virtual network abstractions: hose model and pipe model. In the hose
model abstraction, such as Oktopus [10] and ElasticSwitch [12], all VMs are connected to a central
(virtual) switch by a dedicated link (hose) having a minimum bandwidth guarantee. This model
mimics dedicated clusters with compute nodes connected through Ethernet switches. However, it
4
does not accurately express the requirements for applications with complex traffic interactions. In
the pipe model abstraction, such as SecondNet [13] and CloudNaaS [14], it specifies bandwidth
guarantees between pairs of VMs as virtual pipes. When the traffic is predicable and relative stable,
this model can precisely capture the application traffic needs. In this thesis, we are using the pipe
model to describe the traffic needs of tenants.
Mapping multiple virtual networks into a data center network is very challenging. The data
center must provide all resources that each virtual network requires, including the CPU and mem-
ory requirement for each VM and bandwidth requirement for each pair of VMs. Virtual network
mapping must analyze tenant VNs and identify VM patterns with high traffic flows and consolidate
VM patterns to physical machines in close proximity to reduce the number of active servers and
switches.
Since Today’s data center networks have been designed with redundant links and switches, Vir-
tual Network Mapping must also optimize the flow routing cross data center networks to minimize
the number of active switches. VN mapping can save energy by shutting down some unused phys-
ical machines and switches. Jointly optimizing server energy consumption and network energy
consumption is a challenging problem in VN mapping. Better mapping strategy can minimize the
number of active physical machines and switches, increase the resource utilization, and thus save
energy.
1.3 Research Problems and Objectives
This thesis tackles the research challenges in the aforementioned two problems: Real-Time Task
Scheduling Problem on Multi-Core Processors with Voltage Islands and Virtual Network Mapping
Problem in Data Centers.
In the first problem, this thesis studies energy efficient scheduling of periodic real-time tasks on
multi-core processors with voltage islands, in which cores are homogeneous and partitioned into
multiple blocks (termed voltage islands) and each block has its own voltage supply. Cores in the
5
same block always operate at the same voltage level, but can be adjusted by using Dynamic Voltage
and Frequency Scaling (DVFS). In particular, the following research problems are investigated:
• How to schedule real-time tasks into multi-core processor with voltage island model?
• How to decide the core speed according to the workload?
• What is the lower bound of the scheduling problem?
• What is the upper bound of the scheduling algorithm?
• How to derive the approximation ratio of a scheduling algorithm?
In the second problem, this thesis investigates the energy efficient resource allocation problem
for virtual networks in cloud data centers with a three-tier fat-tree topology. A tenant’s request
can be abstracted to a virtual network with a set of virtual machines (VMs) and the links between
each pair of VMs. A cloud provider places the VMs and route the traffic among the VMs in a way
that minimizes the total number of servers and switches used while providing both computation
guarantee for each VM and bandwidth guarantee for each pair of VMs. The unneeded servers and
switches are powered off to conserve energy. In particular, the following research problems are
investigated:
• What is the bandwidth constraint in a three-tier fat-tree network?
• How to formalize the Virtual Network Mapping problem under computation and communi-
cation constraints?
• How to consolidate VMs into a smaller number of physical servers in a way that minimizes
the number of servers used?
• How to identify traffic patterns in virtual networks?
• What is the time complexity of VM packing?
• How to map VM clusters into physical machines in a data center?
6
1.4 Contributions
The contributions of this thesis can be broadly divided into three categories: survey and analysis of
the power saving strategies, competitive analysis of periodic real-time task scheduling problem on
multi-core processors with voltage islands, analysis of virtual machine placement and flow routing
for communication-intensive service applications in data centers. The main contributions are:
• A review of the state-of-the-art in energy-efficient computing.
• Competitive analysis of periodic real-time task scheduling problem on multi-core processors
with voltage islands.
– Formal definitions of the Voltage Island Energy-Efficient Real-Time Task Scheduling
(VIEES) problems.
– Proposed a Voltage Island Largest Capacity First (VILCF) algorithm for energy-efficient
scheduling of periodic real-time tasks on multi-core processors.
– Analysis of the lower bound of the energy consumption of the optimal solutions for
Voltage Island Energy Efficient Scheduling (VIEES) problem.
– Theoretical analysis of the approximation ratio of the proposed VILCF algorithm, in
terms of energy efficiency, against the lower bound of the optimal solution.
• Virtual machine placement and flow routing for communication-intensive service applica-
tions in data centers.
– Formal definition of the Virtual Network Mapping problem in the fat-tree topology data
centers.
– Prove Virtual Network Mapping is a NP-hard problem.
– Proposed a two-phase VM placement and network routing algorithm that significantly
reduces both the number of active servers and switches needed to provide QoS guaran-
tee.
7
* Virtual Machine Packing: formulated a special Multi-Capacity Bin Packing prob-
lem that consolidates VMs into the fewest number of servers.
* Virtual Network Mapping: presented a weighted graph cut algorithm to map the
consolidated virtual network into a data center network that minimizes the number
of links and switches used.
The results from this work have been published in two journal papers [15] [16] and four con-
ference papers [17] [18] [19] [20].
1.5 Thesis Organization
In Chapter 2, we present a survey of energy-efficient computing systems, as well as the scope of this
thesis and its positioning in the area. We review the energy efficiency techniques in hardware de-
sign including Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management
(DPM). Then, we present the recent development on multi-core processors with voltage islands and
energy efficient real-time task scheduling algorithms on multi-core processors. Furthermore, we
describe several new data center network architectures and two popular virtual network models.
In Chapter 3, we present the preliminary work on powering saving design for servers with
response time constraint. We introduce a smart power saving scheme, PowerSleep, which aims at
power saving for a single server. PowerSleep can minimize power consumption of a single server
under the mean job response time constraint while adopting the extended M/G/1/PS queuing model
for job arrival and job processing. PowerSleep adjusts the server working frequency during running
time by DVFS, and puts the server into the sleep mode once the queue is empty, and activates the
server to work once a new job arrives. To overcome the transition overhead, PowerSleep adds
procrastination sleep period when a new job arrives while the severs still in the sleep mode. The
server keeps sleeping during the procrastination sleep period to collect more jobs into the queue.
After the procrastination sleeping time, the sever wakes up to process jobs. This approach reduces
the mode transition overhead, but it increases the job response time.
8
In Chapter 4, we investigate energy efficient scheduling of periodic real-time tasks on multi-
core processors with voltage islands. We propose a Voltage Island Largest Capacity First (VILCF)
algorithm for energy efficient scheduling of periodic real-time tasks on multi-core processors. It
achieves better energy efficiency by fully utilizing the remaining capacity of an island before turn-
ing on more islands or increasing the voltage level of the current active islands. We provide detailed
theoretical analysis of the approximation ratio of the proposed VILCF algorithm in terms of energy
efficiency.
In Chapter 5, we explore the resource allocation problem for virtual networks in data centers.
We formulate a special Multi-Capacity Bin Packing problem that consolidates VMs into the fewest
number of servers. A server is represented by a bin with multiple capacities corresponding to
the computation resource (CPU, memory, and storage) and communication bandwidth resource
available. Unlike bin packing, the cumulative bandwidth demand of all VMs hosted in a server
can be smaller than the sum of bandwidth demand of individual VMs due to the co-location.
Furthermore, we present a weighted graph cut algorithm to map the consolidated virtual network
into a data center network that minimizes the number of links and switches used.
9
CHAPTER 2
Review of the State of the Art
2.1 Energy-Efficient hardware
2.1.1 Dynamic Voltage and Frequency Scaling
Dynamic Voltage and Frequency Scaling (DVFS), a voltage reduction technique for battery-operated
systems scaling, was introduced in the 90s [21], which dramatically reduces power consumption in
large digital systems by adapting both voltage and frequency of the system with respect to chang-
ing workloads. Equipped with DVFS, a regulated system can adjust the supply voltage of a digital
circuit at the functional boundary for the speed requirements, the temperature, and the technology
parameters.
DVFS describes the use of two power saving techniques (dynamic frequency scaling and dy-
namic voltage scaling) used to save power in embedded systems including cell phones.
The power consumption of a digital circuit is given by the well-known formula [21]:
Power = f ∗ C ∗ V 2 + V ∗ IDC (2.1)
where f is the operating frequency, C is the equivalent capacitance of the circuit, and IDC is the
static current. V is the supply voltage of the digit circuit. The first term is the dynamic power.
Power consumption of digital circuit generally breaks down into two sources, dynamic power
and leakage power. The dynamic power consumption of a digital circuit Pdyn is : f ∗ C ∗ V 2
10
The followed formula shows the dependency of operating frequency on supply voltage [22]:
f ∝ (V − Vth)
α
V
(2.2)
where Vth is the threshold or switching voltage, and the exponent α is an experimentally derived
constant that, for current technology, is approximately 1.3 to 2. Its V 2 factor suggests reducing
supply voltage as the most effective way to decrease power consumption.
The static power lost due to leakage current is V ∗ IDC . Leakage current, the source of static
power consumption, is a combination of sub-threshold and gate-oxide leakage: Ileak = Isub + Iox.
The power of the two leakage currents are Psub ∝ 1− eV and Pgate ∝ V 2 [22]. Leakage power can
be reduced by voltage scaling and sleep transistors since Both Psub and Pgate are a function of the
supply voltage. leakage power continues to become a dominant contribution to power consumption
for future silicon technologies.
Dynamic voltage and frequency scaling (DVFS) can be applied to different levels of granularity
[23]. Per-chip DVFS uses the same power delivery network to reach every core, and consequently,
binds each core to the same DVFS schedule. Per-core DVFS uses a separate voltage regulator for
each core and therefore allows every core to have an independent DVFS schedule. Cluster-level
DVFS uses multiple of on-chip regulators drive a set of DVFS domains, or clusters, so that one
or more cores are associated with each cluster. The smaller the granularity, the more complex the
design and the larger the overhead. The trend towards multi-processor architectures makes scaling
on individual processors an attractive approach. Many applications tend to map well on parallel
processing architectures, especially digital signal processing applications.
2.1.2 Multiple Supply Voltage
As dynamic power is proportional to the square of supply voltage, reducing supply voltage can
significantly reduce active power consumption. Multi-supply voltage (MSV) [24] is introduced to
provide finer-grain power reduction and performance trade-off.
11
MSV can make a chip to work as slowly as possible with the lowest possible supply voltage.
The voltage scaling technique allows modules on the critical paths to use the highest voltage level
thus meeting the required timing constraints while allowing modules on noncritical paths to use
lower voltages thus reducing the energy consumption. This scheme tends to result in smaller area
overhead compared to parallel architectures. Consequently two or more supply voltages were
employed in the chip. Reducing the supply voltage however increases the circuit delay.
There are two types of MSV. In row-based type, there are interleaving high and low supply
voltage standard cell placement rows. In region-based type, circuits are partitioned into voltage
islands where each voltage island occupies a contiguous physical space and operates at a supply
voltage that meets the performance requirement.
Power reduction of MSV can be considered in two aspects, multiple voltage scheduling and
MSV design. The multiples voltage scheduling problem is to assign a supply voltage level selected
from a finite and known number of supply voltage levels to each operation in a data flow graph
(DFG) and schedule various operations so as to minimize the energy consumption under given
timing constraints. JuiMing Chang [25] presented a dynamic programming technique for solving
the multiple supply voltage scheduling problem.
In MSV design, how to divide the regions in the circuit is a problem, especially for the circuit
with region based type MSV, Designers partition circuits into a few groups based on their per-
formance requirement and the connectivity between modules. Each group is then specified with
a supply voltage. Logic boundaries are largely used group partitioning. However, these natural
boundaries in a design are almost always non-optimal boundaries for supply voltages.
Another problem of using MSV is routing of multiple supply voltage lines. In region based
MSV, each voltage island has power line to support power, but it is high cost to implement frag-
mented power networks as implementing such complex power network will take a lot of precious
routing resources from design. That is there is a trade-off between lower energy dissipation and
higher routing cost. Huaizhi Wu in [26] investigated Voltage Island boundary design in a circuit
for optimal power versus design cost trade-off under timing requirement.
12
2.1.3 Dynamic Power Management
Dynamic power management refers to power management schemes implemented in electronic
systems while programs are running [9].
Electronic systems can be viewed as collections of components which may be heterogeneous
in nature. Such components may be active at different times, and correspondingly consume dif-
ferent fractions of the power budget. Electronic systems are designed to be able to deliver peak
performance when requested. Nevertheless, peak performance is required only during some time
intervals., system components are not always required to be in the active state [27] .
DPM can enable and disable components, as well as tune their performance to the workload to
achieve energy-efficiency. DPM is a design methodology for dynamically reconfiguring systems
to provide the requested services and performance levels with a minimum number of active com-
ponents or a minimum load on such components. There is a dynamic power managers in DPM, a
power manager implements a control procedure based on some observations and/or assumptions
on the workload.
DPM is a design methodology for dynamically reconfiguring systems to provide the requested
services and performance levels with a minimum number of active components or a minimum load
on such components.
The power manager implements a control procedure or user-defined and/or application-specific
power management strategies [28] to control component power state based on some observations
and/or assumptions on the workload.
A component can be modeled by a finite-state representation called power state machine (PSM).
States are the various modes of operation. DPM controls power consumption of a component by
transition its power modes. State transitions have a power and delay cost. In general, low-power
states have lower performance and larger transition latency than states with higher power.
DMP can be implemented on several levels: component, system and network cover and relate
different approaches to system-level DPM.
Vassos Soteriou and Li-Shiuan Peh [29] applied DPM to interconnection networks to save
13
power consumption. The links which interconnect network node routers are a major consumer of
power. They studied graph connectivity of a 2D mesh topology and power balance in the graph. and
proposed a dynamic power management policy where network links are turned off and switched
back on depending on network utilization in a distributed fashion.
2.2 Energy-Efficient Multi-Core Processor with Voltage Island
Model
2.2.1 Multi-Core processor
As semiconductor technology of CMOS marches forward in accordance with Moores Law, more
and more transistors can be packed into a single chip, this has caused chip speeds to rise. How-
ever, transistors can not shrink forever. Manufacturers can not continue making single processor
cores more powerful limited by current transistor technology. One reason is that as a transistor
gets smaller, gates are unable to block the flow of electrons, transistors tend to consume more
power; another reason is that increasing clock speeds causes transistors to switch faster and thus
generate more heat and consume more power. They shift to add more independent and actual pro-
cessors(known as cores) into a single chip using so many transistors, this chip is called multi-core
processor [30]. In Multi-core processors, different cores execute different Instructions, operating
on different Data. All cores share the same memory
The multi-core chips do not necessarily run as fast as the highest performing single-core mod-
els, but they can achieve higher throughput with the same clock frequency by handling more work
in parallel. There are two kinds of parallelism. One is Instruction-level parallelism, this kind of
parallelism enables processor to reorder, pipeline instructions, split them into microinstructions,
do aggressive branch prediction. Another is thread-level parallelism. Many new applications are
multi-threaded. For multiple tasks that all have to run at the same time, thread-level parallelism
enable the chips to use a separate core for each task.
14
For applications that cannot be parallelized statically, To take advantage of multi-core chips,
programmers must find good places to break up the applications, divide the work into roughly
equal pieces that can run at the same time.
Krishnan, et al. [31] presented an efficient chip-multiprocessor (CMP) architecture for exploit-
ing thread-level parallelism. This architecture speculatively executes sequential binaries without
the need for source recompilation. It uses software support to identify threads from a sequential
binary. It includes memory disambiguation hardware to detect inter-thread memory dependence
violations.
2.2.2 Voltage Island
Voltage Islands is a system architecture and chip implementation methodology, that can be used to
dramatically reduce active and static power consumption for System-on-Chip (SoC) designs [32].
As the scale of process technologies steadily shrinks, more and more devices can be imple-
mented on a single chip. This enables various applications to be realized as System-on-a-Chip
(SoC) designs. SoCs consist of programmable processors and peripheral cores that are connected
to standard bus-based architectures.
The dynamic power of a core in a chip is a function of operation speed, it increases as operation
frequency increases. However, frequency of operation has increased at a faster rate than the scaling
of the silicon process technology. This has led to an increase in power density of a SoC. In addition
to active power, there are components of leakage power, the most dominant of which is the sub-
threshold current of the transistors in the circuit which drives significant increases in leakage power.
The combination of increasing active power density and increasing leakage currents has created a
power management problem in the semiconductor industry [5].
The core-base design using voltage islands is a new technique which helps reducing both
switching and standby components of power dissipation. Simply speaking, a voltage island is
a group of on-chip cores powered by the same voltage source, independently from the chip-level
voltage supply. The use of voltage islands permits operating different portions of the design at
15
different voltage levels in order to optimize the overall chip power consumption. Voltage island is
a type of MSV.
Introducing voltage islands makes the chip design process even more complicated with respect
to static timing, power routing, floor-planning,
Floor-planning is to create voltage island on the chip. Hu et al. [33] studied the floor-planning
process that is assumed to happen after the die size and package have been chosen. The task
of island partition creation and level assignment are implemented under the physical constraints
involved in the design process. They proposed an method to minimize power consumption through
architecting voltage islands in core-based designs and presenting an algorithm for simultaneous
voltage island partitioning, voltage level assignment, and physical-level floor planning.
Wan-Ping Lee et al. [34] handled voltage island partitioning by dynamic programming (DP).
Given a netlist without reconvergent fanouts, the DP can guarantee an optimal solution for the
voltage assignment in linear time while meeting timing constraint.
However, the chip reliability which is an immunity to soft errors, is becoming another important
issue since large amount of transistors are packed in a chip. Soft errors are circuit errors caused
due to excess charge carriers induced primarily by external radiations. Reducing Supply voltage is
a most effective way to reduce system power consumption but it makes cores more sensitive to soft
errors. Yang et al. [35] proposed a reliability based voltage island partitioning and floor-planning
for System-On-a-Chip (SOC) design.
2.2.3 Energy-Efficient Task Scheduling in Multi-core processor
Power-aware computing and energy efficient scheduling of real-time tasks have been well studied
in the past decade. Due to the convexity of power consumption function, dynamic voltage and
frequency scaling (DVFS) that slows down the processing speed of processors is a commonly
used technique for energy savings [7] [8]. Based on the DVFS technique, many real-time task
scheduling schemes have been developed for uniprocessor and multiprocessors, e.g., [36], [37].
There have been studies on scheduling real-time tasks on multi-core processors. Most of them
16
assume per-core DVFS, where each core has its own power supply and can change its voltage and
frequency level independently from other cores, e.g., [38], [39], [40], [41] [42]. Chen et al. [38]
investigated energy-efficient scheduling of periodic real time tasks over multiple DVFS processors
with the leakage power consideration and proposed a polynomial time algorithm, Largest Task
First (LTF), to derive task mappings that try to execute at a critical frequency. Vinay Devadas et
al. [42] suggested a real-time scheduling algorithm which keeps the performance demands of each
core balanced by migrating tasks between cores. In addition, they proposed Algorithm Dynamic
Core Scaling that changes the number of active cores to reduce leakage power consumption in the
case of low load.
Some recent works have explored the voltage islands model. However, they focused on either
configurations of voltage islands or different applications. Qi and Zhu [43] studied symmetric and
asymmetric block configurations for multi-core processors, which contain the same and different
number of cores on each block, respectively. Each block has a DVFS-enabled power supply. They
investigated the block-partitioned core configurations for multi-core processors and evaluated the
energy efficiency for different block configurations.
Ozturk et el. [44] observed that at the compile time, the loads across of cores in a chip are
imbalanced. They developed algorithms to map the application codes to voltage islands and assign
different voltages to different processors to provide energy saving. Santiago and Chen in [45] stud-
ied the periodic real-time task scheduling and proposed a Single Frequency Approximation (SFA)
scheme that uses a single voltage and frequency for executing, particularly, the lowest voltage
and frequency that satisfies the timing constraints in voltage island based multi-core processors.
However, they used the original Largest Task First (LTF). Devadas et al. [46] investigated energy
management in multi-core processor with one voltage island for a periodic real-time workload
that is partitioned to processing cores by taking into account both static and dynamic power. In
their solution, they first select a subset of cores upon which the workload can be executed without
dissipating excessive static power, then assign core speed by DVFS to reduce dynamic power.
Kong et el. [47] presented a real-time task scheduling approach used to determine the proper
17
number of active voltage islands, task partition, and frequency assignment for voltage island model.
They used Algorithm Last Task Fist (LTF) to partition tasks into subset and assign the subsets into
cores. In their model, cores in the same block can work at different power mode. Some cores can
work at a certain speed to process tasks while others can be set idle with no load assigned into.
Sheshadri et el. [48] employed DVFS technique in session-less power constrained test schedul-
ing of a system-on-chip (SOC). Scaling voltage and frequency using DVFS can alter the test time
and power of a core test, but in SOC test scheduling, it is restricted by the maximum frequency
limit of individual cores and the power limit of an SOC. They proposed heuristic approaches to
minimize the overall test time of an SoC by scaling the voltage and frequency of SOC under its
power limit and the maximum frequency limit of individual cores.
Recently, researchers have started exploring energy-efficient scheduling with the considera-
tions of the non-negligible power consumption of leakage current for current and future circuit
manufacture process [38] [46] [42]. To save energy consumption, cores or voltage islands might
be turned off whenever needed. Chen [38] calculated the approximation ratio of their algorithm to
be 1.283 under considering leakage power.
2.3 Virtualization
2.3.1 Virtualization
Virtualization is a method used to improve energy efficiency of data center which allows the sharing
of one physical server among multiple virtual machines (VM). Virtualization is implemented in
both the server and switch domain but with different objectives. Server domain virtualization
usually achieves energy efficiency by sharing limited resources among different applications.
Virtualization in the Data Center Network domain, on the other hand, aims to implement log-
ically different addressing and forwarding mechanisms on the same physical infrastructure. Data
Center Network virtualization separates logical networks from the underlying physical network,
letting each virtual network (VN) can implement customized network protocols and management
18
policies [49]. Also, since VNs are logically separated from one another, implementing perfor-
mance isolation and application QoS is facilitated. Management procedures of VNs will be more
flexible because every VN has its own control and management system. Furthermore, isolation
offered in network virtualization environments can also minimize the impact of security threats.
Virtualization enables services to be moved between servers, and virtualization has multiple
VMs which can serve different applications multiplexed to share one server. Data center resources
are underutilized since the average traffic load accounts for about 30% of its resources [3], data
centers can migrate virtual machines to consolidate workloads on a set of servers and then shut
down underutilized servers to save a great power. The migration of VMs is optimized by selecting
the VMs to be relocated on the basis of heuristics related to utilization thresholds.
Stage et al. [50] discuss the impact of VM live migration on the network resources. A migration
scheduler determines the optimal schedule for themigrations, based on the knowledge of their
duration, starting time and deadline. The optimal scheduler schedules the live migrations in such
a way that the network is not congested by the VM live migration load. The live migrations are
also fulfilled in time. Beloglazov et al. [51] have proposed that live migration of VMs can be
used to concentrate the jobs on a few physical nodes so that the rest of the nodes can be put in a
power saving mode. The allocation, of new requests for VMs, is done by sorting all the VMs in a
Modified Best First Decreasing (MBFD) order with respect to the current utilization.
In [52], they presented a green energy-efficient scheduling algorithm which makes the use of
priority job scheduling for Cloud computing. The priority job scheduling is used to select VMs for
executing jobs. The VMs are selected according to the weight computed by resource and the SLA
level required by users. Their method can satisfy the minimum resource requirement of a job and
prevent the excess use of resources. The DVFS technique is used to control the supply voltage and
frequency for servers in Cloud computing. This technique can reduce the energy consumption of
a server when it is in the idle mode or the light workload.
In [53] Wang et al. studied the problem of achieving energy efficiency in data center networks
using traffic engineering. They proposed an time-aware virtual machine placement problem. Based
19
on the unique features of data center networks and the switch power model, they proposed three
main principles for virtual machine assignment in order to achieve energy efficiency of the network
in data centers. Then, they analyzed the relation between the power consumption and routing and
propose a two-phase energy-efficient routing algorithm.
2.3.2 Bandwidth Guarantee
Cloud data centers depend on high-performance networks to connect servers within the data cen-
ter and to the rest of the world. Many cloud customers do want to be able to rely on network
performance guarantees.
Ballani et al. [10] summarize several measurement studies showing huge variations in intra-
cloud bandwidth, and describe how performance variability leads to poor and unpredictable appli-
cation performance. Some of these studies have also shown that no-guarantee cloud networks also
suffer from high and highly variable latency, and high loss rates. In Oktopus [10], they proposed a
recursive VM allocation approach for homogeneous bandwidth demand.
Guo et al. [13] proposed the Virtual Data Center (VDC) abstraction for their SecondNet archi-
tecture, with three service models: type-0 service guarantees bandwidth between pairs of VMs3;
type-1 service provides ingress/egress guarantees for a specific endpoints; other traffic is treated as
best-effort. An endpoint can be an entire VM, or a TCP/UDP port on a VM.
Wang et al. [54] proposed an online VM packing algorithm while ensure dynamic bandwidth
Demand in data centers. Unlike the traditional VM packing schemes that characterize the network
bandwidth demands of VMs by a fixed value, they capture VM bandwidth demand by random
variables following probabilistic distributions. they pack VMs into servers such that the number
of serer used is minimal and the chance for VM bandwidth demand violating network constraint is
below a threshold.
However, Oktopus [10] and SecondNet [13]are not work conserving. Work conservation can
fully utilizes the spare bandwidth from unreserved capacity increase VM bandwidth. Popa [12] ap-
plied work conservation to ElasticSwitch. ElasticSwitch provides minimum bandwidth guarantees
20
with hose model abstractions. The hose model offers the abstraction that all VMs of one tenant
appear to be connected to a single virtual switch through dedicated links. Each hose bandwidth
guarantee is transformed into pairwise VM-to-VM rate-limits, and work conservation is achieved
by dynamically increasing the rate-limits when the network is not congested.
Popa et al in [55] identified three main requirements that a desirable solution for sharing cloud
networks should meet: min bandwidth guarantee, proportionality (ranging from the network level
to the link level) and high utilization, and a set of properties to guide the design of allocation
policies in the trade-off space. They showed that one cannot simultaneously provide both band-
width guarantees and network proportionality. The paper proposes mechanisms that can achieve
different subsets of the desired properties: link-level proportionality, restricted forms of network
proportionality, and minimum bandwidth guarantees over tree-structured networks.
Rodrigues et al. [56] described Gatekeeper, which focuses on providing predictable perfor-
mance. Gatekeeper attempts to provide each tenant with the illusion of a single, nonblocking
switch connecting all of its VMs. Each VM is given guaranteed bandwidth, specified per-VM,
into and out of this switch.4 Optionally, a VMs maximum bandwidth can be set larger than its
guarantee, to allow use of otherwise underutilized bandwidth. This allows the provider to trade off
between efficiency and predictability, by adjusting either or both of the minimum and maximum
bandwidths.
2.3.3 Pipe Model
There are four granularities of bandwidth guarantees:Tenant aggregate, per-VM hose model (see
Figure (2.1)), per-VM Pipe model (see Figure (2.2)), and per-flow QoS model. These models are
listed in increasing granularity and flexibility of expressing a tenant’s needs.
Jiang et al [57] investigated VM placement of multiple tenants in data center. They formulated
a problem jointly optimizing VM placement and traffic routing in data center. The VM traffics of a
tenant is represented by a traffic matrix which is a pipe model. They proposed heuristic algorithms
using Markov Chian to bring a solution.
21
VM1  
VM2 VM3 
VM4 
VM5 
Figure 2.1: Pipe Modle
Guo et al. in [13] proposed an algorithm to map a set of VMs to a data center with arbitrary
topology, their algorithm is divided into two steps, first is to decide weather the allocation is avail-
able, another is how to allocate VMs. They extract a set of physical servers in cluster structures of
different size from data center, and map the VMs to these clusters. These clusters looks like con-
tainer to hold the VMs and their traffics. However, it is hard to know the proper size of a cluster
before making allocation, and how to efficiently extract clusters from a data center is another issue.
Wang et al. in [58] investigated power model of switches and proposed an algorithm to allocate
VMs to fat-tree structure data center. In their algorithm, they first keep building ”super” VMs
by merging VMs pairs with the largest traffic flow load and assigning them into a server until
exhausting all server resource, then they regroup these ”super” VMs by k-means clustering and
put these clusters into pods. Their algorithm can be further refined.
Meng et al. in [59] researched the optimization of VM placement in data centers to improve
scalability; VMs with large mutual bandwidth usage are assigned to host machines in close proxim-
ity in data center. The NP-hard VM placement problem is approximated using a two-tier algorithm
that takes the traffic matrix between the VMs and cost matrix between the hosts as input. The
algorithm partitions the servers into clusters based on the cost between clusters. The VMs are then
partitioned into VM clusters in such a manner that minimizes the inter-cluster traffic.
Daniel et al. in [60] proposed an traffic aware algorithm for VM placement in heterogeneous
bandwidth model. Their approach partitions VM sets by removing the edge with minimum traffic
22
load in the request virtual network until all partitioned parts can be assigned into the servers of
data center, extract VM patterns by traffic matrix, But, they did not consider network resource
allocation and bandwidth guarantee for VM placement.
Fang et al. in [61] investigated data center network cost optimization based on VM placement.
They use VM traffic matrix to allocate VMs to data centers. They use Gomory Hu Tree to assign
VMs into physical servers, and then use tabu search approach to map the physical servers into data
center racks.
Li [62] studied the VM placement problem for cost minimization. The cost is caused by PMs
which hold VMs and network cost which is mainly determined by inter-PM traffic. They observed
that it is hard to minimize both PM cost and network cost. There is a trade off between the
two costs. They define network cost by various functions, according to different communication
models and proposed an approximation solution to jointly optimize PM cost and network cost.
2.3.4 Hose Model
VM1  VM2 VM3 VM4 VM5 
Virtual Switch 
Figure 2.2: Hose Model
Hose model is initially proposed [63] for Virtual Private Networks (VPNs), it can be applied
to data center as well. Hose model aggregates the demands for multiple different communications,
all VMs are connected to a central (virtual) switch by a dedicated link (hose). having a minimum
bandwidth guarantee.
Ballani in Oktopus [10] propose two virtual network abstraction models that cater to tenant
23
requirements. The first is virtual cluster which abstracts a tenant as a virtual network in such a
topology: all VMs are connected to a single, non-oversubscribed (virtual) switch resulting in a
one-level tree topology. virtual cluster is a homogeneous hose model, each VM requires the same
amount of compute resource and link bandwidth connecting to virtual switch. virtual cluster is
suitable to all applications, but the tenant cost is not low and provider flexibility is not high. The
second model is Virtual Oversubscribed Cluster (VOC), VOC interconnects clusters with switch-
to-VM bandwidth B with a fixed over-subscription factor O. Comparing with virtual cluster model,
VOC is not suitable to all applications , but it is more flexible to providers to allocate and tenant
cost is lower.
Zhu et al. [64] generalized virtual cluster which is a homogeneous hose model to a heteroge-
neous hose model where each VM can have a heterogeneous bandwidth guarantee. This model can
more accurately express the requirements for applications composed of multiple tiers with com-
plex traffic interactions than virtual cluster, the bandwidth between VMs of the same tenant can
vary significantly over time, depending on the network load and usage from other tenants.
With the hose model, each VM gets a minimum guarantee for all its traffic, irrespective of
whether the traffic is destined to the same tenant or not. However, in multi-tenant data center,
tenants can access other tenants or services, inter-tenant communication should be considered.
Ballani et al. [65] proposed a hierarchical hose model with communication dependency for inter-
tenant communication abstraction. In this model, each VM of a tenant gets a minimum bandwidth
guarantee irrespective of intra- or inter-tenant traffic and the whole beyond this, the hierarchical
hose model introduces an aggregate abstraction for a tenant’s inter-tenant traffic. The tenant also
gets a minimum bandwidth guarantee for its aggregate inter-tenant traffic.
In [66], the authors first profiled the traffic patterns of several popular cloud applications, and
find that they generate substantial traffic during only 30%− 60% of the entire execution, suggest-
ing existing simple VC models waste precious networking resources. We then propose a fine-
grained virtual network abstraction, Time- Interleaved Virtual Clusters (TIVC), that models the
time-varying nature of the networking requirement of cloud applications.
24
Base on hierarchy hose model, Shen [67] proposed Dual-Hose Model for bandwidth guarantees
in multi-tenant cloud networks over-provisioning in accommodating tenant requests. This model
decouples the guarantee for each VMs inter-tenant traffic from the one for its intra-tenant traffic.
Each VM is provided with a minimum bandwidth guarantee for its traffic to other VMs of the same
tenant and a minimum bandwidth guarantee Be for its traffic to other tenants, while in hierarchy
hose model, VM only ensures minimum bandwidth guarantee for the aggregated inter and intra
tenant traffics.
Lee [11] studied interactive applications such as web and OLTP hosted in todays cloud envi-
ronments. These applications have complex and tiered structures. Lee proposed a TAG model to
abstract these multi-tier application bandwidth requirements. TAG is a graph, where each vertex
represents an application tier where a set of VMs performing the same function. bandwidth guar-
antees from one tier to another tier is labeled as directed edges between the corresponding vertices
in the TAG model. VMs in a tier, if they have traffics among them, these VMs form a hose model.
For tier A to tier B , VMs in tier a and B form a directional hose model.
2.4 Data Center Architecture
2.4.1 Data Center Architecture
Data centers can be categorized mainly in two classes, the switch-centric and the server-centric.
In switch-centric data center, switches are the dominant components for interconnection, while in
server-center data center, servers with multiple Network Interface Cards (NIC) take part in routing
and packet forwarding decisions. Switch-centric data centers include VL2 [68], Portland [69],
Fat-Tree [70] .Switch-centric data centers include Dcell [71], Bcube [72].
The fat-tree topology, depicted in Figure (2.3), consists of k pods, each of which consisting
of k
2
edge switches and k
2
aggregation switches. Edge and aggregation switches connected as
a clos topology and form a complete bipartite in each pod. Also each pod is connected to all
core switches forming another bipartite graph. Fat-Tree built with k-port identical switches in
25
all layers of the topology and each of which supports k
3
4
hosts. fat-tree IP addresses are in the
form 10:pod:subnet:hosted. In fat-tree, the address lookup is implemented by a two table lookup
approach instead of the longest prefix matching. Address lookups are done in two steps; first the
lookup engine does a lookup on the first level table to find the longest matching prefix. Then
the matched address is used to index the second level table which holds the information of the IP
address and output port to reach the intended destination.
Switch Server 
Figure 2.3: Fat-tree topology
VL2 was proposed in [68] and considered as a solution to overcome some of the critical issues
in conventional data centers such as over-subscription, agility and fault tolerance.VL2 supports
VM migration from server to server without breaking the TCP connection and keeping the same
address. VL2 implements a clos topology between core and aggregation layers to provide multi-
path and rich connectivity between the two top tiers. VL2 employs Valiant Load Balancing (VLB)
to evenly load balance traffic flows over the paths using Equal Cost Multi Path (ECMP).
Portland was proposed in [69] whose DCN topology is based on a fat-tree network topology.
Portland consists of three layers: edge, aggregation and core. It is built out of low cost commodity
switches. Portland and fat-tree both differ in the addressing scheme for packet routing but both
at the end aim at providing agility among services running on multiple machines. Both reduce
broadcast by intercepting Address Resolution Protocol (ARP) requests and employ a unicast query
through a centralized lookup service. Portland implements hierarchical Pseudo MAC (PMAC)
addresses for layer 2 routing and forwarding protocol. PortLand assigns a unique PMAC address
26
Switch Server 
Figure 2.4: VL2 topology
to each end host.
Unlike switch centric designs, server centric designs appeared to use servers to act as relay
nodes to each others and participate in the traffic forwarding. Server centric schemes such as Bcube
[72], Dcell [7] [71] can provide low diameter compared to switch centric schemes, can provide high
capacity and support all types of traffic, especially for the intensive computing applications with
very low delays.
2.4.2 Energy-Efficient Data Center
The DCN architectures are over-provisioned for peak loads and fault tolerance. On average, the
DCNs remain highly underutilized with an average load of around 5%-25% . Network over-
provisioning and underutilization can be exploited for energy efficiency. Heller et al. [73] pro-
posed ElasticTree, to consolidate the workload on a subset of network resources to save energy.
The authors estimated a feasibility of around 50% energy savings using simulation and hardware
prototype.
Shang [74] proposed an energy-aware routing algorithm for data centers. The objective of en-
ergy aware routing is to save power consumption via putting idle devices on sleep or shutting them
down and using few network devices to provide routing with no sacrifice on network performance.
The algorithm first computes the network throughput through basic routing. Then, it gradually
removes switches until the network throughput approaches the predefined performance threshold.
27
Finally, it powers off or puts on sleep mode the switches that are not involved in the final routing.
Fat-tree interconnection networks are one of the most popular topologies. The particular char-
acteristics of this topology is that it is designed with redundancy to provide multiple alternative
paths for each source/destination pair. Alonso et al. [75] presented a DPM mechanism that dynam-
ically switches on and off network links as a function of traffic and this mechanism is designed to
guarantee network connectivity.
In [76], the authors proposed data center energy-efficient network aware scheduling (DENS)
whose main objective is to balance the energy consumption of a data center with performance,
QoS and traffic demands. DENS achieves this objective via the implementation of feedback chan-
nels between network switches for workloads consolidation distribution amendments to avoid any
congestion or hot spots occurrences within the network which can definitely affect the overall per-
formance. Congestion notification signal by overloaded switches can prevent congestion which
may lead to packet losses and sustain the high data center network utilization.
28
CHAPTER 3
Preliminary Work: Power Saving Design for a Single
Server
3.1 Introduction
Power-aware design for servers is a prominent design issue, since servers consume a major part of
the power in a data center. Power consumption of a server can be reduced by Dynamic Voltage
Frequency Scaling (DVFS). However, DVFS is not efficient in modern computer systems where
the static power consumption plays a major role in the total server power consumption. As shown
in [77], the static power dissipation when a server is idle could reach up to 60% of the peak
power, and is worsened if the power waste in power delivery and cooling sub-systems is counted,
which could increase the power consumption by 50-100%. One way to reduce the static power
consumption is to power off a server when it has no job to execute. Jobs arrive at a server randomly.
When a server is in the sleep mode and a job arrives, the server must wake up to execute the job.
This makes a server to turn on/off frequently. Dynamic Power Management (DPM) can be used to
switch servers between the sleep and idle modes.
DVFS and DPM can be jointly used for the power management in a single server. However,
slowing server working speed and frequent mode transition between the active mode and the sleep
mode introduce significant overheads in terms of time and energy. This can further delay the task
response time.
Since clients are very sensitive to the server performance. Delayed response to users will have
29
negative effects for a hosting company including client frustration and revenue loss. Therefore the
job response time must be considered. Because the power saving and the performance cannot be
ensured at the same time, there is a trade off between the power saving of a server and the job
response time. Reducing the power consumption while maintaining the response time constraint
has been an important problem in the server system design.
In this chapter, PowerSleep, a smart power saving scheme, is introduced. PowerSleep can
minimize the power consumption of a single server under the mean job response time constraint.
PowerSleep adopts the extended M/G/1/PS queuing model for job arrival and job processing. It
uses both DVFS and DPM techniques to reduce the power consumption of a server.
Since mode transitions between the running mode and the sleep mode introduce a net timing
overhead for a server, they degrade the performance in terms of the mean response time of jobs.
When the server utilization is low, the mode transition incurs little overhead. The higher the server
utilization is, the bigger the mode transition overhead is. Therefore, it is necessary in the design of
the sleep mode to reduce the mode transition overhead as much as possible.
To overcome the mode transition overhead, PowerSleep adds a procrastination sleep period
when a new job arrives while the sever is still in the sleep mode. The server does not start to work
immediately. Instead, it keeps sleeping during the procrastination sleep period to collect more
jobs into the queue. After the procrastination sleeping time, the sever wakes up to process jobs.
This approach can decrease the mode transition frequency and thus reduce the mode transition
overhead, but it increases the job response time.
3.2 System Model
Only a single server is considered in this system model. r is defined as the working speed ratio
of the server to its maximum speed, and rl is defined as the lowest speed ratio when no job is
executed, rl ≤ r ≤ 1. When r is greater than rl, the server is then in the running model. When no
job is executed in the sever, r = rl, the server is in the idle mode. To save power consumed, DMP
30
can put a server into the sleep mode. If a job arrives, the server is switched to the running mode to
process the job. A server is in transition mode during the mode transitioning period.
The power consumption of a server in each mode is as follows:
• Idle mode: The server consumes the static power PI .
• Running mode: The power consumption PR(r) by a server at a speed ratio r is
PR(r) = α(r − rl)γ + PI (3.1)
where γ ≥ 1.
• Sleep mode: The power consumption by a server is PS , while PS  PI .
• Transition mode: The server consumes the transition power PT , which is assumed to be
equal to the one in the running mode.
The system in this work is based on the M/G/1/PS server model. A job arrives at the server in
a Poisson distribution with an arrival rate λ. A job service time follows a generalized distribution
with a given mean value E(S) when executing at the maximum speed. All jobs in the queue are
scheduled into a processor by the Processor Sharing (PS) scheduling algorithm. µ is denoted as
1
E(S) , µ =
1
E(S) . ρ is denoted as
λ
µ
, ρ = λ
µ
.
3.3 The Design of the PowerSleep Scheme
The job response time is an important concern in the design of the PowerSleep scheme. Both
DVFS and DPM may prolong the job service time. From the time perspective, a server is either
in the working state when it processes jobs or in the non-working state when it is set in the idle,
sleep, or transition mode. These two states alternate in cycles. A server working cycle is defined
from the time when a server changes form the non-working state to the working state to the next
time when it changes from the non-working state to the working state.
31
In the design of PowerSleep, the following constant parameters of time periods are introduced
to overcome the above raised concern by utilizing both DVFS and DPM:
• Idle period threshold δh is the minimum length of the idle-queue duration before the server
is put into the sleep power mode.
• Sleep period threshold δe is the maximum length of the period during which the server can
stay in the sleep power mode continuously.
• Procrastination sleep period δx, if the job arrives earlier before the expiration of δe, the server
will be procrastinated in the sleep mode for δx period so that jobs can be batched together to
reduce the short idle periods of the job queue.
• Mode transition period δs and δw are the transition time from the idle mode to the sleep mode
and from the sleep mode to the idle mode respectively.
Time periods δh, δe, and δx are defined constant parameters in PowerSleep.
With the above pre-determined constant parameters of time periods (δh, δe, δx) , and of speed
ratio r, PowerSleep can be described as the following steps:
1. Once the queue is empty, the server intends to hold on in the idle power mode for δh time
unit;
2. If a new job arrives before the expiration of the δh time unit, the server will immediately
serve the new job. Otherwise, the server will be set to the sleep power mode and then it will
be enforced to stay in the sleep power mode for δe time unit;
3. If a new job arrives before the expiration of the δe time unit, the server will remain in the
sleep power mode for a procrastination sleep period δx time unit (counted from the arrival of
the new job). In other words, the job will wait for δe time units. Otherwise, the server will
be put in the idle power mode again until a new job arrives;
32
4. Once the server is in the running power mode, the server runs at a constant speed ratio r in
the running power mode serving jobs in the queue until the queue become empty;
5. Once the queue is empty, repeat Step 1.
Figure 3.1: Scenarios for PowerSleep
δi is defined as the length of a preceding idle-queue duration waiting for a job arrival. Figure
3.1 illustrates the change of the power mode with PowerSleep under different scenarios, where the
X-axis is the time line and the Y-axis is the workload in the queue. (a) If δi ≤ δh, when a new job
arrives, the server will immediately serve the new job; (b) If δh ≤ δi ≤ δh + δs + δe, when a new
job arrives, the server will remain in the sleep power mode for extra δx time units, and then wake
up to serve; (c) If δi ≥ δh + δs + δe, when the server is waken up, the server will stay in the idle
power mode until a new job arrives.
3.4 Power Consumption and Response Time Analysis
Under PowerSleep, when a job arrives at an empty queue, it cannot be served immediately; rather
the server requires an additional amount of time δx (called a starter) to start from non-working
state to working state to serve the new first job. Jobs which arrive to a server in working will join
the queue and be served in turn as in a simple queuing system. Starter under PowerSleep includes
33
the wake-up transition plus the procrastination sleep period and may also include the remaining
portion of a suspend transition. Queue with Starter model [78] is adopted to here used to analyze
PowerSleep model with a starter.
The server has different power consumptions in different modes, the probabilities of these
modes at which server works can be obtained by the Queue with Starter model. In an M/G/1 server
under PowerSleep with Starter TX , a job arrival rate λ, and a generalized service time distribution
with a given mean value E(S), the probabilities of these modes at which server works are :
piR = λE(S) (3.2)
piT = (1− λE(S))λ(δs + δw)e
−λδh
1 + λE(Tx)
(3.3)
piS = (1− λE(S))e
−λδhλ(λδx + e−λδs(1− e−λδe))
1 + λE(Tx)
(3.4)
piI = 1− piR − piT − piS (3.5)
where piR, piT , piS , and piI are defined as probabilities of running mode, transition mode, sleep mode
and idle mode respectively.
With the probabilities of the power modes, in an M/G/1 server under PowerSleep, the mean
power consumption of a server is:
E(P ) = PR(r)piR + PTpiT + PSpiS + PIpiI (3.6)
The response time under PowerSleep must be considered. It is shown in [78] that the additional
delay in a queue introduced by a starter is independent of the response time in the system without
starters. Using this independence property, it is then easy to calculate the total response time in the
system with starters: it is simply the sum of the response time in the queue without starters plus
the additional delay RX introduced by starter. By the traditional queue theory, the mean response
time of a job in a server without starters is E(S)
1−λE(S) . By [78], in a system with a job arrival rate λ
34
and starter, the mean additional delay introduced by Starter is
E(RX) =
E(TX) + 12λE(T
2
X)
1 + λE(TX)
(3.7)
In an M/G/1/PS server under PowerSleep with Starter TX , a job arrival rate λ, and a generalized
service time distribution with a given mean value E(S), the mean response time of a job is
E(R) =
E(S)
1− λE(S) + E(RX) (3.8)
In order to evaluate the performance of power consumption and response time under Power-
Sleep, E(TX) and E(TX2) must be obtained.
By the definition of a starter in Queue with Starter, Starter Tx under PowerSleep includes
the wake-up transition plus the procrastination sleep period and may also include the remaining
portion of a suspend transition, which depends on the preceding idle-queue period δi before a new
job arrival. Based three scenarios classified by δi, the starter TX is summarized as:
TX =

δh + δ − δi
δx + δw
δh + δ + δe − δi
0
(3.9)
where δ is δ = δs + δx + δw.
The preceding idle-queue period λi is the same as the idle period defined in an ordinary M/G/1
model, which follows the exponential distribution with a mean value 1
λ
. Therefore, for TX defined
35
in (3.9), its mean value and variance can be obtained as:
E(TX) =
∫ δh+δs
δh
(δh + δ − t)e−λtdt+
∫ δh+δs+δe
δh+δs
(δx + δw)e
−λtdt
+
∫ δh+δ+δe
δh+δs+δe
(δh + δ + δe − t)e−λtdt
= e−λδh(δ − 1
λ
+
1
λ
(e−λδs(1− e−λδe) + e−λ(δ+δe))) (3.10)
and
E(TX2) =
∫ δh+δs
δh
(δh + δ − t)2e−λtdt+
∫ δh+δs+δe
δh+δs
(δx + δw)
2e−λtdt
+
∫ δh+δ+δe
δh+δs+δe
(δh + δ + δe − t)2e−λtdt
= e−λδh((δ − 1
λ
)2 +
1
λ2
(1− 2e−λ(δ+δe)) + 2
λ
(δx + δw − 1
λ
)e−λδs(1− e−λδe))) (3.11)
Let σ = e−λδh + λδ − 1 + e−λδs(1− e−λδe) + e−λ(δ+δe), the probabilities defined in equations
(3.2) to (3.5) can be written as
piR =
ρ
r
(3.12)
piT = (1− ρ
r
)
λ(δs + δw)
σ
(3.13)
piS = (1− ρ
r
)
λδx + e
−λδs(1− e−λδe)
σ
(3.14)
piI = 1− piR − piT − piS (3.15)
In an M/G/1/PS server under PowerSleep, the mean power consumption is
E(P ) =α(r − rl)γ(ρ
r
+ (1− ρ
r
)
λ(δs + δw)
σ
) + PI
+(PI − PS)(1− ρ
r
)
λδx + e
−λδs(1− e−λδe)
σ
(3.16)
36
and the mean response time of a job is
E(R) =
1
µ(r − ρ) +
λ
2
δ2 + (δx + δw)e
−λδs(1− e−λδe))
σ
(3.17)
Rˆ is denoted as mean response time threshold. The optimization problem of minimizing the
mean power consumption under a given mean response time constraint can easily be formulated as
follows:
minimize E[P ] (3.18a)
subject to E[R] ≤ Rˆ, (3.18b)
max{rl, ρ} ≤ r ≤ 1. (3.18c)
The optimization problem defined in (3.18) can in general be solved with a Lagrangian function
L = E[P ] + χE[R] (3.19)
To resolve this optimization problem, ∂L
∂r
= 0, ∂L
∂δh
= 0, ∂L
∂δx
= 0, ∂L
∂δe
= 0 are set. Variables
r∗, δ∗h, δ
∗
x, δ
∗
e are denoted as the corresponding optimal values.
In an M/G/1/PS server under PowerSleep, the minimal power consumption E(P ∗) under the
mean response time threshold Rˆ can be achieved with an optimal configuration of r∗, δ∗h, δ
∗
x, and
δ∗e .
3.5 Conclusion
This chapter explores how to minimize the mean power consumption in a server under the mean
response time constraint for reducing the power cost. PowerSleep, a smart power-saving schemes
37
proposed in this chapter, applies both DVFS and DPM to put the server to a low-power sleep mode.
By adopting the extended M/G/1/PS queuing model for job arrival and execution, PowerSleep
presents how to jointly decide the execution speed for jobs and the sleep period such that the mean
response time constraint is satisfied and the mean power consumption is minimized.
38
CHAPTER 4
Voltage Island Aware Energy Efficient Scheduling of
Real-Time Tasks on Multi-Core Processors
4.1 Introduction
This chapter studies energy efficient scheduling of periodic real-time tasks on multi-core proces-
sors with voltage islands, in which cores are partitioned into multiple blocks (termed voltage is-
lands) and each block has its own power source to supply voltage. Cores in the same block always
operate at the same voltage level, but can be adjusted by using Dynamic Voltage and Frequency
Scaling (DVFS). Algorithm Voltage Island Largest Capacity First (VILCF) is proposed for energy
efficient scheduling of periodic real-time tasks on multi-core processors. It achieves better energy
efficiency by fully utilizing the remaining capacity of an island before turning on more islands
or increasing the voltage level of the current active islands. A detailed theoretical analysis of the
approximation ratio of the proposed VILCF algorithm in terms of energy efficiency is provided.
In addition, the experimental results show that VILCF significantly outperforms the existing algo-
rithms when there are multiple cores in a voltage island.
39
4.2 System Models
4.2.1 Power Model
In this study, we adopt a practical and widely accepted power model [33] [34] [43]. Power con-
sumption of a core is composed of two parts, leakage power and dynamic power. We denote dy-
namic power Pd, which can be adjusted by DVFS. Pd increases as working speed of core increases,
we denote the working speed of cores s, the function of dynamic power to s is:
Pd(s) = CefV
2
dds (4.1)
where Vdd denotes working voltage of cores,Cef denotes effective switch capacitance, s = κ
(Vdd−Vt)2
Vdd
,
Vt denotes threshold voltage, and κ denotes hardware-design-specific constant, (Vdd ≥ Vt ≥ 0, κ >
0, and Cef > 0).
We assume leakage power is a constant. The total power consumption per core is the sum of
leakage power and dynamic power: Pd(s) + β2. We denote leakage power β2. We normalize the
power consumption function as :
P (s) = s3 + β (4.2)
where s3 represents dynamic power, and β represents leakage power, see [38].
From equation (4.2), we know that P (s) is a convex and non-decreasing function of the core
speed s. We draw the curve of this function in figure (4.1), smin and smax denote the lower bound
and upper bound of s respectively. From the power to speed curve, we find that there exists a critical
speed s0 at which power consumption is optimal, (in this study, we assume smin ≤ s0 ≤ smax),
when cores run at critical speed, Pd(s)
s
is minimum. By solving dP (s)
ds
= 0, we derive s0 = 3
√
β
2
, see
figure (4.1).
40
0 
0 
s 
P 
s0 smin smax 
Figure 4.1: Power consumption to speed.
4.2.2 Periodic Real-Time Task Model
We focus on a set of periodic independent real time tasks denoted by T = {τi(φi, pi, ci), i = 1...n}.
Each task τi contains a sequence of task instances refereed as jobs which arrive periodically. For
task τi, φi is the phase which is the release time of the first task instance, pi is the period, and ci
is computation time. We define the size of task τi as cipi . For two tasks τi and τj , we say task τi
is greater than τj when we have cipi >
cj
pj
, and task τi is less than τj when we have cipi <
cj
pj
. We
assume the deadline of each task is the same as the period pi. Further, we assume φi = 0, and all
tasks arrive at time 0.
Given a set T of tasks, the hyper-period of T , denoted as L, is defined as the least common
multiple (LCM) of the periods of tasks in T . For a given core i with a set of n assigned periodic
tasks, the processor utilization factor U is the fraction of processor time spent in the execution of
the task set ci
pi
, utilization factor for n tasks at this core is given by:
U =
∑
i
ci
pi
(4.3)
If no task is assigned to a core, we say the core is empty.
For uniprocessor independent real time tasks scheduling, the earliest-deadline first (EDF) schedul-
41
ing algorithm is an optimal scheduling algorithm on preemptive uniprocessors. To meet time con-
straints of all tasks at one core, the working speed of the core needs to be faster than U , U ≤ s.
4.2.3 Multi-Core Processors with Voltage Islands
As to Chip Multi-core Processors of voltage island model, we assume all cores are homogeneous.
Cores are partitioned into M blocks, (M ≥ 2), and each block contains the same number of cores,
denoted as K (K ≥ 1). Each block has only one supply voltage line. Cores of the same block are
set to the same voltage Vdd, and consume the same power. We can adjust voltage on the block level
dynamically using DVFS, but different blocks can work under different voltage. We also apply
DPM technique to turn on/off blocks. If no task is assigned to a block, the block can be turned off
and put into dormant mode. When new task comes, the block can be reactivated. Activating block
from dormant mode to active mode needs additional energy Esw and time overhead. In this study,
to simplify our model, we do not consider this overhead and let Esw be equal to 0. DVFS + DPM
can reduce the power consumption of the cores and make them work energy-efficiently.
We denote (m, k) with 0 ≤ m ≤ M − 1 and 0 ≤ k ≤ K − 1) as core k of block m. We let
index of core (m, k) to be m ∗K + k. The index of core (0, 0) is the smallest of all cores, while
the index of core (M − 1, K − 1) is the largest. We denote Tm,k as a set of tasks assigned to core
(m, k), and denote lm,k as the load of core (m, k) where lm,k is equal to
∑
τi∈Tm,k
ci
pi
. We let lmaxm
be max
k
(lm,k) of block m, and denote lmaxm as limit of block m. We define critical core of block
m as the core of this block whose load is lmaxm . The working speed and the voltage of block m are
determined by the load of the critical core of that block. We denote tm,k as capacity of core (m, k)
while tm,k is equal to lmaxm − lm,k, the capacities of all critical cores are 0.
4.3 Problem Definition and An Approximation Algorithm
We are interested in finding an algorithm to schedule a set of periodic and independent real time
tasks to a CMP with voltage islands while making the power consumption minimum under time
42
constraint.
4.3.1 Voltage Island Energy-Efficiency Scheduling (VIEES)
Definition 1. Given a set T of real time tasks over CMP with M blocks and K identical cores in
each block, the power function of each core is P (s) = s3 + β, where β > 0, and all tasks are
ready at time 0. Each periodic task τi ∈ T is associated with a computation requirement in ci
CPU-cycles and a period pi, where the relative deadline of τi is pi. The cores can work on any
speed in [Smin, Smax]. The problem is to minimize the energy consumption in the hyper-period L
of tasks in T in the scheduling of tasks in T without missing the timing constraint, where each task
is executed entirely on a core.
Lemma 1. VIEES is an NP hard problem.
Proof. VIEES is an optimization problem. Since decision is no harder than optimization, we prove
VIEES is NP hard if its corresponding decision problem is NP hard. A corresponding decision
problem of VIEES is to decide whether a set of tasks can be assigned to cores such that the sum of
task sizes of each core is equal.
The NP-hard of the decision problem is proved by a reduction from the 3-PARTITION problem,
which is NP-complete [79, 80]. 3-PARTITION problem is expressed like this: Given n numbers
a1, ..., an ∈ S, it is to decide if there are sets S1, S2, S3 with S1, S2, S3 ⊂ S and S1 ∩ S2 =
φ, S1∩S3 = φ, S2∩S3 = φ and S1∪S2∪S3 = S such that
∑
ai∈S1 ai =
∑
aj∈S2 aj =
∑
ak∈S3 ak.
Given a 3-Partition instance, an instance is created for VIEES problem by setting task size ci
pi
=
ai∑n
j=1 aj
∈ (0, 1]. The workload of each core can be perfectly balanced (sum of task sizes of each
core is equal) if and only if the set S can be partitioned into three disjoint sets of equal sums.
43
4.3.2 Energy Consumption
Given a set of real time independent tasks T , we partition them into task subsets
T0,0, T0,1, ..., TM−1,K−1, and assign Tm,k to core (m, k) for all 0 ≤ m ≤ M − 1, 0 ≤ k ≤ K − 1,
these task subsets are disjoint, Tm,k ∩ Ti,j = ∅ where i 6= m or j 6= k, then we assign working
speeds to blocks, we call this assigned task subsets and working speeds of blocks a Schedule
SC. A schedule is feasible if all core speeds assigned for its time intervals are valid, no task
misses its timing constraint, and each task is executed entirely on a core. The energy consumption
of a schedule SC is denoted as Φ(SC). A schedule is optimal if it is feasible, and its energy
consumption is minimal of all feasible schedules.
Due to convexity of power consumption function, for every task τi ∈ Tm,k, the minimum
energy consumption schedule would execute all of the tasks at s0 if lm,k ≤ s0, or at
∑
τi∈Tm,k
ci
pi
if lm,k ≥ s0, see [38]. A core would turn into the dormant mode when it completes all jobs and
becomes idle.
Considering the minimum energy consumption of tasks τi ∈ Tm,k executed on core (m, k) in
the hyper-period L, if lm,k ≤ s0, the minimum energy consumption is Ls0 · lm,k · P (s0), otherwise,
the minimum energy consumption is L · P (lm,k). We define energy consumption function ψ(l) for
tasks in Tm,k completed on the core (m, k) during time L with load l = lm,k as:
ψ(l) =

L(l3 + β), if l > s0
L l
s0
(s0
3 + β), otherwise
(4.4)
For schedule SC, Φ(SC) is equal to
∑M−1
m=0 K ∗ ψ(lm,0) because in Voltage Island model, ψ(lm,k)
is equal to ψ(lm,0) for 1 ≤ k ≤ K − 1.
The following Lemmas resulting from the convexity of the power consumption function will
be widely used for algorithmic analysis in this study [38].
Lemma 2. ψ(γx+(1−γ)y) ≤ γψ(x)+(1−γ)ψ(y), for any non-negative reals x, y and 0 ≤ γ ≤ 1.
Lemma 3. Suppose that lm + ln = l′m + l′n , and lm ≤ l′m, l′n ≤ ln, then ψ(l′m) + ψ(l′n) ≤
44
ψ(lm) + ψ(ln).
4.3.3 Voltage Island Largest Capacity First (VILCF) Algorithm
Because VIEES is NP hard, we explore heuristic scheduling algorithm. In this study, we propose
a task scheduling algorithm named Voltage Island Largest Capacity First (VILCF) for the VIEES
problem.
Algorithm 1 VILCF
Input: T,M,K
Output: T0,0, T0,1, ..., TM−1,K−1
1: sort all tasks in T in non-increasing order ci
pi
for τi ∈ T
2: set l0,0, l0,1, ..., lM−1,K−1 to 0
3: set t0,0, t0,1, ..., tM−1,K−1 to 0
4: set T0,0, T0,1, ..., TM−1,K−1to φ
5: for i = 1 to |T | do
6: find the largest tm,k;(break tie by index of core)
7: if ci
pi
≤ tm,k
8: Tm,k ← Tm,k ∪ τi and lm,k ← lm,k + cipi
9: else find the smallest lm,k; (break tie by index of core)
10: Tm,k ← Tm,k ∪ τi and lm,k ← lm,k + cipi
11: sort Tm,k in non-increasing order by lm,k,map the task subsets to cores by index, and
tm,k ← lmaxm − lm,k
12: end for
With VILCF, initially the capacities and loads of all cores are set to 0, and task subsets assigned
to the cores are set to φ, (tm,k = 0, lm,k = 0, Tm,k = φ,m = 0, ...,M − 1; k = 0, ..., K − 1), and
tasks in T are sorted in non-increasing order by their size ci
pi
. We get the largest task, and compare
it to the largest capacity of all cores, if the task is no greater than the largest capacity, then this
task is assigned to the core with the largest capacity; otherwise, the task is assigned to the core
with the minimum load. After this task scheduling, we sort the assigned task subsets by their loads
in non-increasing order and map them to the cores by index; then recompute the capacities of all
cores, and continue scheduling next task. We observe that core (m, 0) is the critical core of block
m for 0 ≤ m ≤ M − 1, and lm,0 is the limit of block m, (lmaxm = lm,0), because the index of core
(m, 0) is the smallest of block m. The time complexity of VILCF is O(|T | log |T |).
45
For example, given a chip with 8 cores partitioned into two blocks and four cores in each
block, we schedule a set of tasks T , T = [6, 5, 5, 4, 3, 3, 2, 2, 2, 2] into these cores. Initially, lm,k =
0, tm,k = 0, Tm,k = φ,m = [0, 1], k = [0, 1, 2, 3]. task c0p0 = 6 is assigned to core (0, 0) because the
task is greater than all capacities, and the index and load of core (0, 0) are both the smallest; after
this scheduling, core (0, 0) becomes the critical core of block 0, and the limit of block 0 lmax0 is 6,
the capacities of block 0 are updated to t0,0 = 0, t0,1 = t0,2 = t0,3 = 6; then task c1p1 is assigned to
core (0, 1), because its capacity is 6 and index is smallest, t0,1 is changed to 1; and tasks c2p2 = 5 and
c3
p3
= 4 are assigned to core (0, 2) and (0, 3) respectively. As to task c4
p4
= 3, it is greater than all
capacities, so we select core (1, 0) with minimum load and smallest index; task c5
p5
= 3 is assigned
to core (1, 1). The capacity of core (0, 3) is 2 which is the largest, so we schedule task c6
p6
= 2 into
core (0, 3); task c7
p7
= 2, c8
p8
= 2 are assigned to core (1, 2) and (1, 3); for task c9
p9
= 2, it is greater
than capacities of all cores, so this task is assigned to core (1, 3) whose load is minimum. After
finishing task scheduling, we sort the loads of task subsets and map the task subsets to the cores by
index, so we obtain l0,0 = 6, l0,1 = 6, l0,2 = 5, l0,3 = 5, l1,0 = 4, l1,1 = 3, l1,2 = 3, l1,3 = 2.
4.4 Lower Bound of Energy Consumption of the VIEES Prob-
lem
In this section, we derive the lower bound of the energy consumption of VIEES problem which
is used to show the approximation ratio of Algorithm VILCF. The approximation ratio can be ob-
tained by comparing Φ(SCV ILCF ) to the lower bound of optimal solutions for all input instances.
4.4.1 Semi-VIEES Problem
We assume the tasks in set T to be sorted in a non-increasing order by task sizes, ci
pi
≥ ci+1
pi+1
. Let
k+ be the largest index satisfying that after scheduling this task, there is no core which is empty,
let T+ be the set of the first k+ tasks of T .
46
T2 T3 T4 T5 T6 T9 T7 T8 T1 
T11 T12 T10 
T13 
T14 
0,0 0,1 0,2 1,0 1,1 1,2 2,0 2,1 2,2 
Figure 4.2: Optimal Schedule of the Semi-VIEES problem if
∑M−1
m=0
∑K−1
k=0 (l
max
m − l′m,k) ≤∑
τi∈T\T f
ci
pi
, M = 3, K = 3.
T2 T3 T4 T5 T6 T9 T7 T8 T1 
T11 T12 T10 
T13 
T14 
0,0 0,1 0,2 1,0 1,1 1,2 2,0 2,1 2,2 
Figure 4.3: Optimal Schedule of the Semi-VIEES problem if
∑M−1
m=0
∑K−1
k=0 (l
max
m − l′m,k) >∑
τi∈T\T f
ci
pi
, M = 3, K = 3.
47
We define a task as a completing task when we schedule this task, it is no greater than the
largest capacity and the core which it is assigned to is not empty.
Lemma 4. Scheduling a completing task does not change the task subsets assigned to the critical
cores by Algorithm VILCF.
Proof. Suppose task τi is assigned to core (m, k) as a completing task, we know that cipi ≤ lmaxm −
lm,k we obtain cipi + lm,k ≤ lm,0 while cipi + lm,k is the load of new changed task subsets, since it does
not exceed the load of the critical core of the same block, scheduling of task τi does not change
task subset assigned to critical cores.
Lemma 5. Scheduling tasks in T+ makes |Tm,0| to be equal to 1,(m=0,1,...,M-1).
Proof. There always exists an empty core before task with index k+ is assigned by Algorithm
VILCF. When task τi is assigned to block m with no task, this task would be assigned to core
(m, 0) which becomes critical core of block m after this scheduling, the capacity of core (m, 0)
is 0, the capacities of all other cores of block m are updated to be greater than 0. If a new task
comes, it would be assigned to one core with largest capacity, if the new task is a completing task,
by Lemma 4 no task subset of critical cores is changed; otherwise, the new coming task is assigned
to an empty core because its load is 0 which is the minimum, so no more task would be assigned
to the critical cores which already have one task assigned.
Let k∗ be the largest index satisfying that c
∗
p∗ is not a completing task, and is assigned to a core
which has only one task τ ′ with c
∗
p∗ ≥ 12 c
′
p′ . Let T
f be the set of the first k∗ tasks of T , denote T\T f
the set of remained tasks in T after subtracting T f . We observe k+ should be no larger than k∗
(k+ ≤ k∗), denote T f\T+ the set of remained tasks in T f after subtracting T+.
Lemma 6. Scheduling tasks in T f is an optimal solution by Algorithm VILCF.
Proof. By Lemma 5, critical cores has only one task assigned after scheduling tasks in T+. We
continue to schedule a task τi ∈ T f\T+ into a core which contains only one task τj , making the
new assigned task subset to have two tasks.
48
We find the load of the new assigned task subset is no greater than that of any core which
already has completing tasks assigned to. Suppose core (m, k) has completing tasks, there is one
task τk ∈ Tm,k satisfying ckpk ≥
cj
pj
because of the largest task first scheduling strategy of Algorithm
VILCF. τi is no greater than any other tasks in Tm,k since τi is the smallest of all assigned tasks,
and core (m, k) has at least two tasks, so load ci
pi
+
cj
pj
is no greater than that of core (m, k).
From above description, scheduling a task in τi ∈ T f\T+ does not change the index of task
subset which already has completing tasks, and it does not change the task subset of critical core
which is in the same block, either, so no task subset which has completing tasks is assigned to a
critical core after sorting step in Algorithm VILCF. Hence, after scheduling tasks in T f\T+, the
critical cores contain at most two tasks, and for any critical cores which has two tasks, one task
must come from T f\T+ which is greater than half of the other.
We can not swap tasks between critical core (m, 0) and non-critical core (m′, k′), (k′ 6= 0, 0 ≤
m,m′ ≤M − 1). Assume core (m, 0) has two tasks τi, τj with cipi ≥
cj
pj
, if we swap τi for any task
in core (m′, k′), this swapping increases lm′,k′ to be greater than lm′,0 since τi is not a completing
task; if we swap τj , this swapping increases lm,0 since τj is less than any task in core (m′, k′).
Only critical cores are examined here. For any three tasks τi, τj , τk assigned to critical cores
which have two tasks, this equation holds.
ci
pi
≤ cj
pj
+
ck
pk
(4.5)
We could transform a schedule SC into another schedule which assigns at most two tasks in T f
on a critical core without increasing energy consumption. Suppose in a schedule SC, three tasks
τi, τj , τk are assigned to core (m, 0), we can have a core (m′, 0) who is assigned to two tasks by
Algorithm VILCF has only one task τl, for all these tasks, Equation (4.5) holds. We can move one
task τi from core (m, 0) into core (m′, 0), to create a new schedule SC ′, let T SC
′
m,0 to be T
SC
m,0\{τi},
and T SC′m′,0 to be T
SC
m′,0
⋃{τi}, by Equation (4.5), we know lSCm′,0 < lSC′m′,0 and lSCm,0 > lSC′m,0 , then we get
ψ(lSCm′,0) + ψ(l
SC
m,0) > ψ(l
SC′
m′,0) + ψ(l
SC′
m,0 ) by Lemma 3.
49
We first schedule tasks in T f by applying Algorithm VILCF, let l′m,k denote load of core (m, k)
under this scheduling. Note that if T\T f 6= ∅. We relax the constraint of the VIEES problem by
allowing task migration and simultaneous execution on multiple cores for the tasks in T\T f , such
that tasks could be executed on more than one cores simultaneously. We define such a relaxed
problem as SEMI-VIEES problem.
4.4.2 Optimal Solution of Semi-VIEES Problem
In voltage Island model, we know that, for core(m, k) with tm,k > 0, (0 ≤ m ≤ M − 1 ,
0 ≤ k ≤ K−1), we can add load δ to it without increasing energy consumption if lm,k+δ ≤ lmaxm .
The total load which can be added to the cores of bock m is
∑K−1
k=0 (l
max
m − l′m,k), see figure (4.2),
the shaded area in block m in figure (4.2) represents
∑K−1
k=0 (l
max
m − l′m,k).
Here let us show an optimal schedule SC∗ of the SEMI-VIEES problem. Let Γ =
∑
τi∈T\T f
ci
pi
,
and let λ and λm for 0 ≤ m ≤M − 1 be the positive values that satisfies:
Minimize λ (4.6a)
Maximize λm (4.6b)
Subject to
∑M−1
m=0
∑
K max{λ− l′m,k,max{λm − l′m,k, 0}} = Γ (4.6c)
λm ≤ l′m,0,m = 0, ...,M − 1 (4.6d)
In SEMI-VIEES problem, since task migration and simultaneous execution of a task on mul-
tiple cores are allowed for the tasks in T\T f , we can distribute the computation of these tasks
among the cores in two steps. In the first step, for core (m, k) in the m − th block with tm,k ≥ 0
and 0 ≤ m ≤ M − 1, if λm ≥ l′m,k, we distribute load λm − l′m,k of the tasks in T\T f to core
(m, k), and the resulting l′m,k is equal to λm. l
′
m,k is either greater than or equal to λm, we know this
load distribution does not increase energy consumption if λm ≤ l′m,0, we hope to distribute load
of tasks in T\T f as much as possible to cores in this step, but by the constraint of λm ≤ l′m,0, it
50
is possible that there is still remained load of tasks in T\T f after subtracting the load distribution
of the first step. In the second step, we distribute remained load of tasks in T\T f to core (m, k)
while 0 ≤ m ≤ M − 1 , 0 ≤ k ≤ K − 1 and, if λ > l′m,k, we distribute load λ− l′m,k of the tasks
in T\T f to core (m, k), and the resulting l′m,k is equal to λ, for core (m, k), l′m,k is either greater
than or equal to λ.
Let SC∗ denote the resulting schedule, where Φ(SC∗) =
∑M−1
m=0 K ∗ ψ(l′m,0). We now prove
the optimality of SC∗ and the relation of the loads in SC∗ and SCV ILCF .
We see two cases of optimal schedule of the Semi-VIEES problem with M = 3, K = 3, in
case 1 (see figure 2),
∑M−1
m=0
∑K−1
k=0 (l
max
m − l′m,k) is no greater than
∑
τi∈T\T f
ci
pi
; in case2 (see figure
3),
∑M−1
m=0
∑K−1
k=0 (l
max
m − l′m,k) is greater than
∑
τi∈T\T f
ci
pi
.
Lemma 7. SC∗ is optimal schedule for the Semi-VIEES problem.
Proof. Let SC be any feasible schedule of the Semi-VIEES problem, consider two critical cores
core (m, 0), and (m′, 0), let T SCm,0, l
SC
m,0 be set of task of T
f and load assigned on core (m, 0) in
SC respectively. Assume
∑
τi∈TSCm,0
ci
pi
>
∑
τi∈TSCm′,0
ci
pi
and T SCm,0 ∪ T SCm′,0 = T SC∗m,0 ∪ T SC∗m′,0 , we find∑
τi∈TSC∗m,0
ci
pi
and
∑
τi∈TSC∗m′,0
ci
pi
are between
∑
τi∈TSCm,0
ci
pi
and
∑
τi∈TSCm′,0
ci
pi
since scheduling first k∗
tasks is an optimal schedule in SC∗.
Let lm+m′ = lSCm,0 + l
SC
m′,0 −
∑
τi∈TSCm,0∪TSCm′,0
ci
pi
, let λ′m+m′ be the value satisfies
(max(λ′m+m′ −
∑
τi∈TSC∗m,0
ci
pi
, 0) + max(λ′m+m′ −
∑
τi∈TSC∗m′,0
ci
pi
, 0)) = lm+m′ , lSC
∗
m,0 and l
SC∗
m′,0 are
max(λ′m+m′ ,
∑
τi∈TSC∗m,0
ci
pi
) and max(λ′m+m′ ,
∑
τi∈TSC∗m′,0
ci
pi
) respectively. Both lSC∗m,0 and l
SC∗
m′,0 are
between lSCm,0 and l
SC
m′,0, by Lemma 3 and equation (5), ψ(l
SC∗
m,0 ) + ψ(l
SC∗
m,0 ) ≤ ψ(lSCm,0) + ψ(lSCm,0), we
transform SC into SC∗ without increasing energy consumption.
For critical cores in both schedules SCV ILCF and SC∗, it could be likelihood that lm,0 is equal
to l′m,0 for 0 ≤ m ≤M − 1, if it is not, let Mˆ be the smallest index of block satisfying l′Mˆ,0 6= lMˆ,0.
Lemma 8. lMˆ,0 is at most
3
2
lmin where lmin is the minimum load of the M ∗K cores derived from
Algorithm VILCF.
51
Proof. Consider core (Mˆ, 0), since lMˆ,0 6= l′Mˆ,0, TMˆ,0 assigned to critical core (Mˆ, 0) changes
after scheduling task in T\T f by Algorithm VILCF, so there is at least one non-completing task in
T\T f assigned to a core whose index is no larger than Mˆ by Lemma 4, the load of this core is no
less than lMˆ,0.
Consider two cores (M − 1, K − 2) and (M − 1, K − 1) with the least loads lmin and l′min,
(l′min ≤ lmin). A non-completing task τ ′, τ ′ ∈ T\T f is assigned to core (M − 1, K − 1) by
Algorithm VILCF, the load of core (M − 1, K − 1) is l′min + c
′
p′ , after sorting step in Algorithm
VILCF, the index of this task subset is no larger than Mˆ . We know c
′
p′ ≤ 12 l′min since τ ′ is not
in T f , we obtain
l′min+
c′
p′
lmin
≤ 1 +
c′
p′
lmin
≤ 3
2
. Since lMˆ,0 is no greater than l
′
min +
c′
p′ , so we obtain
lMˆ,0 ≤ 32 lmin.
4.5 Approximation Ratio of Algorithm VILCF
We have derived the lower bound of the minimum energy consumption. We now show the ap-
proximation ratio of Algorithm VILCF. The approximation ratio can be obtained by comparing
Φ(SCV ILCF ) to the lower bound of optimal solutions for all input instances. First we figure out
the worst energy consumption by algorithm VILCF.
Figure 4.4: Upper bound of energy consumption of the VIEES problem, M = 3, K = 3.
Lemma 9. In a schedule of upper bound of energy consumption of the VIEES problem, lm,k is
equal to lm+1,0 for 0 ≤ m ≤M − 2; 1 ≤ k ≤ K − 1.
Proof. Given two cores (m, k), (m + 1, 0), k 6= 0, we have lm,i > lm+1,0, note that the two cores
are in different blocks. Let δ = lm,k− lm+1,0, load δ does not increase energy consumption in block
m while more load is processed, so energy efficiency is better when lm,k is greater than lm+1,0 than
when they are equal, see figure (4.4).
Lemma 10. Suppose f(y) = (µ∗ψ(2y)+(Mˆ−µ)ψ(3y))
Mˆ∗ψ( Λ
K∗Mˆ )
for positive numbers Mˆ ≥ 3 and Λ, and a non-
negative number µ, where 0 ≤ y, 0 ≤ µ ≤ Mˆ and Λ = K ∗(µ ∗ 2y + (Mˆ − µ) ∗ 3y)−(K−1)∗y,
52
then f(y) ≤ 54Mˆ3−18Mˆ(2δ∗−3Mδ)+2(9Mˆ2+2δ∗)
3
2
27Mˆδ2
where δ = 2Mˆ − 1, and  = 3Mˆ − 1.
Proof. We have to consider three cases: (1) 3y < s0, (2) 2y ≤ s0 ≤ 3y, and (3) s0 < 2y. For the
first case, the numerator and the denominator of f(y) are the same, hence, f(y) = 1.
For the second case, we have two sub cases s0 ≤ ΛK∗Mˆ and s0 > ΛK∗Mˆ . By definition µ =
3yMˆ−Λ+(K−1)y
K
y
, and then µ ≥ (3Mˆ−1)∗y− ΛK
y
.
For the first sub case, we have f(y) ≤ K∗(
Λ
K
+(1−2Mˆ)y
y
∗((3y)3+β)+( (3Mˆ−1)y−
Λ
K
y
) 2y
s0
s03)
K∗Mˆ∗(( Λ
K∗Mˆ )
3+β)
.
By rephrasing yˆ = y
Λ/(K∗Mˆ) and sˆ0 =
s0
Λ/(K∗Mˆ) , we obtain:
f(y) ≤
Mˆ+(1−2Mˆ)yˆ
yˆ
∗(3yˆ)3+( (3Mˆ−1)yˆ−Mˆ
yˆ
) 2yˆ
sˆ0
sˆ0
3
Mˆ
= g(yˆ).
By solving g(yˆ)′ = 0, given 2yˆ ≤ 1 ≤ 3yˆ, we derive:
yˆ =
3Mˆ +
√
9Mˆ2 + 2(2Mˆ − 1)(3Mˆ − 1)sˆ02
9(2Mˆ − 1) (4.7)
We set δ = 2Mˆ − 1, and set  = 3Mˆ − 1, as a result:
f(yˆ) ≤ 54Mˆ3−18Mˆ(2δ∗−3Mδ)sˆ02+2(9Mˆ2+2δ∗∗sˆ02)
3
2
27Mˆδ2
, the last inequality comes from sˆ0 = 1, we obtain:
f(yˆ) =
54Mˆ3 − 18Mˆ(2δ ∗ − 3Mδ) + 2(9Mˆ2 + 2δ ∗ ) 32
27Mˆδ2
(4.8)
Similarly, we could have f(yˆ) ≤ (54Mˆ3−18Mˆ(2δ∗−3Mδ)sˆ02+2(9Mˆ2+2δ∗∗sˆ02)
3
2
27Mˆδ2
)/sˆ0
2, for the second
sub-case, where 1 ≤ sˆ0 ≤ 1.5 and the last inequality comes from sˆ0 = 1.
For third case s0 ≤ 2y, we have f(y) ≤ K∗(
Λ
K
+(1−2Mˆ)y
y
∗((3y)3+β)+( (3Mˆ−1)y−
Λ
K
y
)((2y)3+β))
K∗Mˆ∗(( Λ
K∗Mˆ )
3+β)
.
By solving f ′(yˆ) = 0, we get yˆ = 38Mˆ
3∗(30Mˆ−19) .
We obtain:
f(y) ≤ 4 ∗ 19
3 ∗ Mˆ2
27 ∗ (30Mˆ − 19)2 (4.9)
53
From equation (4.8) and (4.9), we find that value of function (4.9) is no greater than function
(4.8) when Mˆ ≥ 3, and in the second case f(yˆ) = 1.283 while in the third case f(yˆ) = 1.13 if
Mˆ =∞.
Theorem 1. For a chip multi-core processor of voltage island with Mˆ blocks, the approximation
of algorithm VILCF to the VIEES problem with Esw = 0 is
54Mˆ3−18Mˆ(2δ∗−3Mδ)+2(9Mˆ2+2δ∗) 32
27Mˆδ2
where Mˆ ≥ 3, δ = 2Mˆ − 1, and  = 3Mˆ − 1.
Proof. Φ(SCV ILCF ) is equal to
∑M−1
m=0 K∗ψ(lm,0) and Φ(SC∗) is equal to
∑M−1
m=0 K∗ψ(l′m,0), SC∗
is a lower bound of the optimal solution of the VIEES problem, we know that the approximation
ratio A is
∑M−1
m=0 K ∗ ψ(lm,0)/
∑M−1
m=0 K ∗ ψ(l′m,0). If for all 0 ≤ m ≤ M − 1, lm,0 is equal to
l′m,0 , then A = 1; otherwise, let Mˆ be the smallest index satisfying that l
′
Mˆ,0
6= lMˆ,0, we have
Φ(SCV ILCF ) =
∑M−1
m=0 K ∗ ψ(lm,0)δm<Mˆ +
∑M−1
m=0 K ∗ ψ(lm,0)δm≥Mˆ and Φ(SC∗) =
∑M−1
m=0 K ∗
ψ(l′m,0)δ
m<Mˆ +
∑M−1
m=0 K ∗ ψ(l′m,0)δm≥Mˆl′m,0>λ +
∑M−1
m=0 K ∗ ψ(l′m,0)δm≥Mˆl′m,0=λ, where δ
a
b is 1 if b is true
and a is true; otherwise, it is 0. We know
∑M−1
m=0 K ∗ ψ(lm,0)δm<Mˆ =
∑M−1
m=0 K ∗ ψ(l′m,0)δm<Mˆ .
We obtain
A =
∑M−1
m=0 ψ(lm,0)δ
m<Mˆ+ψ(lm,0)δm≥Mˆ∑M−1
m=0 ψ(l
′
m,0)δ
m<Mˆ+ψ(l′m,0)δ
m≥Mˆ
l′m,0>λ
+ψ(l′m,0)δ
m≥Mˆ
l′m,0=λ
.
Considering block m with m < Mˆ , we have∑M−1
m=0
∑K−1
k=0 lm,kδ
m<Mˆ ≤∑M−1m=0 ∑K−1k=0 l′m,kδm<Mˆ since lm,k ≤ l′m,k,
so we obtain
∑M−1
m=0
∑K−1
k=0 lm,kδ
m≥Mˆ ≥∑M−1m=0 ∑K−1k=0 (l′m,kδm≥Mˆl′m,k>λ + l′m,kδm≥Mˆl′m,k=λ).
We know
∑M−1
m=0
∑K−1
k=0 lm,kδ
m≥Mˆ−∑M−1m=0 ∑K−1k=0 (l′m,kδm≥Mˆl′m,k>λ+ l′m,kδm≥Mˆl′m,k=λ) is no greater than∑M−1
m=0
∑K−1
k=0 (l
max
m − l′m,k)δm<Mˆ because at most the load of
∑M−1
m=0
∑K−1
k=0 (l
max
m − l′m,k)δm<Mˆ is
distributed from T\T f into all blocks whose index are smaller than Mˆ in SC∗.
Let Λ =
∑M−1
m=0
∑K−1
k=0 lm,kδ
m≥Mˆ , and we know
∑M−1
m=0
∑K−1
k=0 (l
max
m − l′m,k)δm≥Mˆ is no greater
than (K − 1) ∗ (l0,0 − lMˆ,0).
So we have Λ− (K − 1) ∗ (l0,0− lMˆ,0) ≤
∑M−1
m=0
∑K−1
k=0 (l
′
m,kδ
m≥Mˆ
l′m,k>λ
+ l′m,kδ
m≥Mˆ
l′m,k=λ
), we obtain:
A ≤
∑M−1
m=0 ψ(lm,0)δ
m<Mˆ+ψ(lm,0)δm≥Mˆ∑M−1
m=0 ψ(l
′
m,0)δ
m<Mˆ+(M−Mˆ)∗ψ(
Λ/K−(l0,0−lMˆ,0)
M−Mˆ )
(4.10)
54
Let lmin be the minimum load of some cores derived from algorithm VILCF. Considering Λ
in a schedule of of upper bound of energy consumption by Lemma 9, we know for m ≥ Mˆ ,
the loads of lMˆ+1,0, ..., lM−1,0 are assigned to K cores respectively, and there is only one core
assigned lMˆ,0, and K − 1 cores assigned lmin. So we obtain in an upper bound schedule Λ =∑M−1
m=Mˆ
K ∗ lm,0 − (K − 1)(lMˆ,0 − lmin).
With Lemma 8, we know that lmin ≤ lm,0 ≤ 32 lmin for any m − th block with m ≥ Mˆ , so we
have Λ ≥∑M−1
m=Mˆ
K∗lm,0−(K−1)12 lmin, Let γm be the value satisfying γmlmin+(1−γm)32 lmin =
lm,0. For any m− th block with m ≥ Mˆ , since lmin ≤ lm,0 ≤ 32 lmin, γm be a real number between
0 and 1, and ψ(lm) ≤ γmψ(lmin) + (1− γm)ψ(32 lmin). By taking lmin2 as y, we derive:
Λ ≥ K ∗ (µ ∗ 2y + (M − Mˆ − µ) ∗ 3y)− (K − 1) ∗ y (4.11)
where 0 ≤ µ ≤ Mˆ .
By Lemma 2, as a result we know that
∑M−1
m=0 ψ(lm,0)δ
m≥Mˆ ≤ ∑M−1m=0 (γˆmψ(lmin) + (1 −
γˆm)ψ(
3
2
lmin))δ
m≥Mˆ ≤ µψ(2y) + (M − Mˆ − µ)ψ(3y), we get
f(y) ≤
∑M−1
m=0 ψ(lm,0)δ
m<Mˆ+µψ(2y)+(M−Mˆ−µ)ψ(3y)∑M−1
m=0 ψ(l
′
m,0)δ
m<Mˆ+(M−Mˆ)∗ψ(
Λ/K−(l0,0−lMˆ,0)
M−Mˆ )
by equation (4.10) and (4.11).
For 0 ≤ m ≤M−2, lm,0−lm+1,0 is less than the minimum non-completing task τmin ∈ T\T f .
When τmin is assigned to core (m, k), we know τmin + lm,k ≤ 3y and τmin ≤ 12 lm,k, we obtain
τmin ≤ y , hence we derive lm,0 − lm+1,0 ≤ y and l0,0 − lMˆ,0 ≤ Mˆy.
Since µψ(2y)+(M−Mˆ−µ)ψ(3y)
(M−Mˆ)∗ψ(
Λ−(K−1)∗(l0,0−lMˆ,0)
M−Mˆ )
≥ 1 and lm,0 ≥ 3y for m < Mˆ , we get∑M−1
m=0 ψ(lm,0)δ
m<Mˆ ≥ Mˆψ(3y), we have f(y) ≤ Mˆψ(3y)+µψ(2y)+(M−Mˆ−µ)ψ(3y)
Mˆψ(3y)+(M−Mˆ)∗ψ( Λ/K−Mˆy
M−Mˆ )
.
Consider the denominator of f(y), we have Mˆ ∗ 3y+ (M − Mˆ) ∗ Λ/K−Mˆy
M−Mˆ ≥M ∗Λ/(M ∗K),
so we derive Mˆψ(3y) + (M − Mˆ) ∗ ψ(Λ/K−Mˆy
M−Mˆ ) ≥M ∗ ψ( ΛK∗M ) by Lemma 3, so we get f(y) ≤
µψ(2y)+(M−µ)ψ(3y)
M∗ψ( Λ
K∗M )
, by Lemma 10, the theorem is proved.
Corollary 1. The approximation of Algorithm VILCF of the VIEES problem is 4∗193∗Mˆ2
27∗(30Mˆ−19)2 when
β and Smin are both 0 with block number is no less than 3.
55
4.6 Simulation Results
In this section, we evaluate the energy efficiency of our proposed algorithm VILCF under different
block partitions. In addition, we compare the performance between VILCF and Largest Task First
(LTF) [38].
The workload parameters and performance metrics are set as follows. The power consumption
function is set as P (s) = s3 + β, where s is the speed of a core and β is the static power consump-
tion. β is a variable between 1 to 5. The task sizes are integral random variables between 1 and 6.
The minimum speed Smin is set as 0. The hyper-period L is 1 and the switching overhead Esw is 0.
The number of cores is fixed to 128. When the number of blocks (voltage islands) increases from
21 to 25, the number of cores in each block decreases from 26 to 22, respectively. Experimental
results are obtained with 128 independent runs for each configuration.
The normalized energy of an algorithm is defined as the ratio of energy consumption of the
derived schedule to that of the optimal schedule SC∗. Since VIEES is NP-hard, the normalized
energy provides an approximation ratio. The smaller the normalized energy is, the better the
approximation. As expected, we see the normalized energy decreases when the number of blocks
increases, see figure (4.5).
0 5 10 15 20 25 30
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
Voltage Island block number
En
er
gy
 c
on
su
m
pt
io
n 
ap
pr
ox
im
at
io
n
Figure 4.5: The energy consumption approximation to the number of blocks
We compare VILCF with LTF [38]. In both algorithms, the tasks are sorted first, and then the
56
largest task is scheduled first. With LTF, a task is assigned to a core with the minimum load. While
with VILCF, a task is assigned to a core with the largest capacity. The other difference is that after
each scheduling, VILCF sorts the partitioned task subsets and reassigns them to cores, but LTF
does not sort again. In figure 6, the task set size ranges from 150 to 550, and the number of blocks
varies from 21 to 25. The β is 2. VILCF significantly outperforms LTF in term of normalized
energy, see figure 6. On average, the normalized energy of VILCF is about 1.06, while that of LTF
is 1.17.
150200250300350400450500550
0.9
1
1.1
1.2
1.3
1.4  
Number of TaskNumber of Block
21
 
22
23
24
Av
er
ag
e 
No
rm
al
ize
d 
Ee
ne
rg
y
25
VILCF
LTF
Figure 4.6: Comparison of Normalized Energy between VILCF and LTF when the number of core
is 128, the number of blocks ranges in [21, 25], the task set ranges in [150, 550]
1 1.5 2 2.5 3 3.5 4
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
η ratio of number of tasks to number of cores
Po
w
er
 c
on
su
m
pt
io
n 
ap
pr
ox
im
at
io
n
 
 
2 blocks
4 blocks
8 blocks
16 blocks
32 blocks
Figure 4.7: Power consumption approximation when η ranges from 1.2 to 4, stepped by 0.2. β is
set as 2.
57
Figure (4.7) shows the average normalized energy of VILCF by varying η, which is the ratio
of the number of tasks to the number cores. We vary η from 1.2 to 4, stepped by 0.2. The average
normalized energy is worst when η is close to 3. When η is small, most cores are assigned with
only one task, and the assignment is nearly the same as the optimal schedule. However, when
the ratio η increases, the load of each core increases and the average normalized energy increases.
Furthermore, the more number of blocks the chip partitioned, the better the energy-efficiency is,
see figure (4.7) and (4.8).
Figure (4.8) shows the average normalized energy of VILCF by varying β. We vary β from
0 to 5, stepped by 0.5. η is set as 3. When β increases, the average normalized energy decreases
slightly, even though the proportion of the energy consumption resulting from leakage current to
the total energy consumption increases greatly.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
β
Po
w
er
 c
on
su
m
pt
io
n 
ap
pr
ox
im
at
io
n
 
 
2 blocks
4 blocks
8 blocks
Figure 4.8: Power consumption approximation when β ranges from 0 to 5. η is set as 3.
4.7 Summary
This chapter studies energy efficiency in data centers. It presents a Voltage Island Largest Capacity
First (VILCF) algorithm for energy-efficient scheduling of periodic real-time tasks on multi-core
processors of voltage island model. It achieves better energy efficiency by fully utilizing the re-
maining capacity of a voltage island before turning on more voltage islands or increasing the volt-
58
age/speed of current active voltage islands. When there is no upper bound on the core speeds and
the leakage power is negligible, the approximation ratio of VILCF is 4∗19
3∗Mˆ2
27∗(30Mˆ−19)2 where Mˆ ≥ 3.
For example, the approximation ratio is 1.81 and 1.283, respectively, when Mˆ is 3 and ∞. Fur-
thermore, when the overhead in turning on/off a processor is negligible, the approximation ratio
of VILCF is 54Mˆ
3−18Mˆ(2δ∗−3Mδ)+2(9Mˆ2+2δ∗) 32
27Mˆδ2
where Mˆ ≥ 3, δ = 2Mˆ − 1, and  = 3Mˆ − 1. In
general, the approximation ratio decreases with increasing of the number of blocks.
59
CHAPTER 5
Traffic and Energy Aware Virtual Network Mapping
in Data Centers
5.1 Introduction
This chapter investigates the energy-efficient resource allocation problem for virtual networks in
cloud data centers.
Virtualization is an effective approach to save the energy consumption of a tenant placed into
a data center. A cloud tenant expresses computation requirement for each virtual machine (VM)
and bandwidth requirement for each pair of VMs. Virtualization can consolidates VMs into fewer
servers (also termed physical machines (PMs)) to increase utilization of data centers.
However, consolidating VMs of a tenant to fewer servers is restricted by server computation
resource such as CPU, memory, and storage. Most tenants can not be placed into a single server,
they must be placed into multiple servers. VMs of a tenant placed in different servers always need
to communicate to each other. Thus data center need to allocate network resource to the tenant.
Recently, more and more applications and services deployed in data centers require link bandwidth
guarantees for their communications. So VM placement of a tenant is also restricted by network
resource.
Most data center network is designed with redundancy. For example, fat-tree network is de-
signed in a multi-root path topology, the bisection connection of a fat-tree is full connection. Not
all switches need be active when a tenant is placed into the data center, unneeded switches can be
60
powered off to save energy consumption.
This chapter studies placing the virtual network abstracted by a tenant into a data center while
minimizing the active servers and switches under the constraints of server computation resource
and network resource.
5.2 The Model
5.2.1 Network Architectures
Core 
Aggregation 
Edge 
Pod1 Pod2 Pod3 Pod4 
Figure 5.1: Three-tier fat-tree topology
We focus on fat-tree networks, such as Portland [69], since they are widely deployed in data
centers. We assume the data center network model of this study is a three-tier homogeneous fat-
tree network modeled as graph G(V,E), where V is the set of vertex and E ∈ V × V is the set
of edges (see Figure (5.1)). The vertex in V are classified into two types: the network switches
and the physical machines (PMs) (which are also referred as servers). The sets of switches and
PMs are denoted as S and P respectively. Therefore, V = S ∪ P . The edge e ∈ E represents a
communication link between a PM and a switch, or a pair of switches.
We assume each switch in a fat-tree network has K ports. There are K pods in the network,
each of which has K switches. For pod k while 1 ≤ k ≤ K, K
2
of switches are grouped as aggre-
gation tier (AGGRk) and the other K2 of switches are grouped as edge tier (EDGEk). Switches
in edge tier and switches in aggregation tier are connected in a bipartite topology. For switch
i ∈ EDGEk and j ∈ AGGRk, we have edge ei,j = 1. For edge switch i (i ∈ EDGEk), it is
61
connected to 1
2
K PMs which are grouped as RACKi. There are 14K
2 core switches in the core
tier (CORE) of data center network, each aggregation switch in every pod is connected to 1
2
K of
core switches. The total number of switches of a three-tier fat-tree network is 5
4
K2. We denote the
bandwidth capacity of a link as Cl, and the computation capacity of a PM as Cs.
A fat-tree network is richly connected to support very high bisection bandwidth. However, its
link utilization is often very low when data center networks run well below the capacity. To reduce
the energy consumption, traffic flows could be consolidated into smaller number of links/switches
and then power off unused links and switches.
5.2.2 VM Communication Bandwidth Allocation
In a multi-tenant data center, when a tenant makes a request to a data center, it simply asks for
some amount of computation resource such as CPU, memory, and storage. A tenant’s computation
instances, virtual machines (VMs), always demand some network resources to support the com-
munication among them. In this study, a tenant is abstracted to a virtual network using pipe model,
and this virtual network is modeled as a graph Gv with VMs T as vertex and traffic matrix M
as edges. A tenant can specify a diverse set of computation and bandwidth demands for different
VMs. Thus, the modeled graph is a weighted graph. The vertex weight represents the VM com-
putation demand ri, and the edge weight represents the traffic flow size mi,j between VMs i and
j.
The VM placement is restricted by two constraints: PM computation capacity Cs and link
bandwidth capacity Cl. We define the load of PM p as
∑
i ∆i,pri, for PM p, its load should be no
more than its computation capacity Cs, (
∑
i ∆i,pri ≤ Cs).
We study VM flow routing in three-tier fat-tree networks. We discuss this problem in four
cases. In case 1, a pair of VMs i and j (i, j ∈ T ) along with flow mi,j are placed into PM p with
∆i,p = 1 and ∆j,p = 1. Their communication is carried out inside PM p. This intra-PM traffic cost
is ignored since no network resource is required for their communication.
In case 2, VMs i and j are placed into two different PMs s and t with s 6= t and ∆i,s = 1 and
62
Table 5.1: Notation
T Set of VMs
M Traffic matrix of a tenant virtual network
mi,j Bandwidth demand by the flow between
VM i ∈ T and VM j ∈ T
ri computation resources required by VM i
∆i,p A binary decision for assigning VM i to PM p
xs A binary decision for switch s being active.
yp A binary decision for PM p being active.
Ωkmi,j A binary decision for traffic mi,j
crossing switch k.
eu,v A binary decision for link between switch u
and switch v
EDGEk Set of switches in edge tier of pod k
AGGRk Set of switches in aggregation tier of pod k
CORE Set of switches in core tier
RACKk Set of PMs share edge switch k
W Set of VM-clusters
D inter VM-cluster traffic matrix
di,j Bandwidth required by the flow between
VM-cluster i ∈ W and VM-cluster j ∈ W
Γp,s A binary decision for mapping VM-cluster p to a PM in rack s
63
∆j,t = 1, and PMs s and t share the same edge switch k (s, t ∈ RACKk). Flow mi,j crosses edge
switch k, so we have Ωkmi,j = 1.
In case 3, VMs i and j are placed into two different PMs s and t with s 6= t and ∆i,s = 1
and ∆j,t = 1, and PMs s and t connect to two different edge switches u and v (s ∈ RACKu, t ∈
RACKv, u 6= v) , but they are located in the same pod n (u, v ∈ EDGEn). According to fat-tree
topology, edge switches u and v connect all K
2
aggregation switches in this pod. One aggregation
switch in AGGRn is selected to route flow mi,j , so we obtain Ωumi,j = 1 and Ω
v
mi,j
= 1 and∑
k∈AGGRn Ω
k
mi,j
= 1.
In case 4, VMs i and j are placed into two different PMs s and t with s 6= t and ∆i,s = 1
and ∆j,t = 1, and PMs s and t are located in different pods m and n. PMs s and t connect edge
switches u and v respectively (s ∈ RACKu, t ∈ RACKv, u ∈ EDGEm, v ∈ EDGEn,m 6= n),
edge switches u and v select an aggregation switch in their own pod respectively, and these two
aggregation switches select a common core switch to route flow mi,j . So we obtain Ωumi,j = 1 and
Ωvmi,j = 1 and
∑
k∈AGGRm Ω
k
mi,j
= 1 and
∑
k∈AGGRn Ω
k
mi,j
= 1 and
∑
k∈CORE Ω
k
mi,j
= 1.
The bandwidth resource of a link is allocated to the flows which cross this link. In a fat-
tree network, for two switches u and v, if flow mi,j crosses both of them with Ωumi,j = 1 and
Ωvmi,j = 1, and if there is a link between switches u,v with eu,v = 1, flow mi,j must cross link
eu,v, so the bandwidth requirement of the flow to the link is eu,vΩumi,jΩ
v
mi,j
mi,j . We define the
load of link eu,v as
∑
i,j∈T eu,vΩ
u
mi,j
Ωvmi,jmi,j . The load of link eu,v should be no more than Cl
(
∑
i,j∈T eu,vΩ
u
mi,j
Ωvmi,jmi,j ≤ Cl).
From the four cases above, we study flow routing in each tier of a fat-tree network. First, we
consider flow mi,j routing in edge tier switches. We assume link ep,s connects PM p to edge switch
s. For flow mi,j , if VM i or j is hosted in PM p, flow mi,j must cross link ep,s:
Ωsmi,j = ∆i,p(1−∆j,p) + ∆j,p(1−∆i,p) (5.1)
where p ∈ RACKs The bandwidth which link ep,s must allocate to flow mi,j is mi,j ∗ (∆i,p(1 −
64
∆j,p) + ∆j,p(1−∆i,p)), the load of link ep,s is
∑
i,j∈T mi,j ∗ (∆i,p(1−∆j,p) + ∆j,p(1−∆i,p)).
Second, We consider flow mi,j routing in aggregation tier switches in pod k while 1 ≤ k ≤ K.
We let δm =
∑
p∈RACKm ∆i,p(1−
∑
p∈RACKm ∆j,p)+
∑
p∈RACKm ∆j,p(1−
∑
p∈RACKm ∆i,p) while
switch m ∈ EDGEk and k =
∑
m∈EDGEk
∑
p∈RACKm ∆i,p(1−
∑
m∈EDGEk
∑
p∈RACKm ∆j,p) +∑
m∈EDGEk
∑
p∈RACKm ∆j,p(1−
∑
m∈EDGEk
∑
p∈RACKm ∆i,p). δm = 1 means VM i or j is hosted
in a PM in RACKm, the other is not. k = 1 means VM i or j is is hosted in a PM located in pod
k, the other is not.
If PMs which host VM i or j are both located in pod k , then we have∑
m∈EDGEk
∑
p∈RACKm ∆i,p = 1 and
∑
m∈EDGEk
∑
p∈RACKm ∆j,p = 1 (see case 1,2,3). In
case 1 and 2, we have
∑
m∈EDGEk δm = 0. In case 3, flow mi,j crosses two edge switches,
we have
∑
m∈EDGEk δm = 2, mi,j must cross one aggregation switch in pod k, so we have∑
s∈AGGRk Ω
s
mi,j
= 1. In case 4, one of PMs which host VM i or j is located in pod k, we
know
(
∑
m∈EDGEk
∑
p∈RACKm ∆i,p)(
∑
m∈EDGEk
∑
p∈RACKm ∆j,p) = 0 and k = 1. Flow mi,j must
cross one aggregation switch in pod k. We obtain:
∑
s∈AGGRk
Ωsmi,j = (
1
2
(
∑
m∈EDGEk
∑
p∈RACKm
∆i,p)
∗(
∑
m∈EDGEk
∑
p∈RACKm
∆j,p) + k)
∑
m∈EDGEk
δm (5.2)
Equation (5.2) indicates flow mi,j crosses one aggregation switch in case 3 and 4 in pod where
VMs placed.
Last, we consider flow mi,j routing in core tier switches, we have:
∑
s∈CORE
Ωsmi,j =
1
2
K∑
k=1
k (5.3)
Equation (5.3) indicates flow mi,j crosses one core switch in case 4.
65
5.2.3 Problem Definition
Now we define the problem of traffic and power aware virtual network mapping (TPVNM). Given
a data center with a three-tier homogeneous fat-tree network G = (V,E), and a set of VMs T
with traffic matrix M , we place VMs into PMs and assign traffic flows onto links/switches while
minimizing the active PMs and switches under PM computation capacity Cs and link bandwidth
capacity Cl while guaranteeing the computation requirement for each VMs and the bandwidth
requirement for each pair of VMs.
Minimize
∑
s∈S
xs +
∑
p∈P
yp (5.4a)
Subject to
∑
i∈T
∆i,pri ≤ Cs ∗ yp, (5.4b)
∑
p∈P
∆i,p = 1, (5.4c)
∑
i,j∈T
mi,j ∗ (∆i,p(1−∆j,p) + ∆j,p(1−∆i,p))
≤ Ωkmi,jxkCl,
(p ∈ RACKk), (5.4d)∑
i,j∈T
eu,vΩ
u
mi,j
Ωvmi,jmi,j ≤ xuxvCl,
(u ∈ S, v ∈ S), (5.4e)∑
s∈CORE
Ωsmi,j =
1
2
K∑
k=1
k, (5.4f)
∑
s∈AGGRk
Ωsmi,j = (
1
2
(
∑
m∈EDGEk
∑
p∈RACKm
∆i,p)
∗ (
∑
m∈EDGEk
∑
p∈RACKm
∆j,p) + k)
∑
m∈EDGEk
δm,
(1 ≤ k ≤ K). (5.4g)
66
Constraint (5.4b) models the computation capacity constraint of a PM where VMs are placed.
Constraint (4c) means any VM must be placed into one and only one PM. Constraint(5.4d) cap-
tures bandwidth constraint of a link between a PM and an edge switch. Constraint(5.4e) captures
bandwidth constraint of a link between switches. Constraints (5.4f) ensure flows to cross at most
one core switch. Constraints (5.4g) ensure flows to cross at most one aggregation switch in a pod.
Theorem 2. TQVMP is an NP-hard problem.
Proof. This can be proved by a reduction from the 3-Partition problem defined as follows: [81]
given n = 3k integers a1, a2, ..., an and a threshold S such that S4 ≤ ai ≤ S2 and
∑n
i ai = kS. The
task is to decide if the numbers can be partitioned into triples such that each triple adds up to S.
This problem is NP-complete.
The reduction is as follows. given an instance of 3-Partition, we construct a graph G using
integers ai while 1 ≤ i ≤ n. For each ai, we break it up to a set of smaller integers ai,1, ai,2, ..., ai,m
such that
∑m
j=1 ai,j = ai and m ≥ 1. We create a fully connected weighted subgraph for integer
ai with m vertex, the weights of these vertex are ai,1, ai,2, ..., ai,m respectively. So an instance
of 3-Partition problem is transformed to that of the TQVMP problem in polinomial time. If the
3-Partition instance can be solved, the TQVMP problem in G can be solved without cutting any
edge. If the 3-Partition instance cannot be solved, the optimum TQVMP in G will cut at least one
edge.
5.3 The Solution
Since TPVNM is an NP-hard problem, we explore heuristic algorithm for it. We divide our solution
into two phases: Virtual Machine (VM) Packing and Virtual Network (VN) Mapping.
Placing VMs into DCN is to cut graph Gv into subgraphs and put them into PMs under two
constraints Cs and Cl while minimizing PMs and switches. If the placement is restricted mainly
by PM CPU utilization, we say this virtual network is Computation-Intensive. If the placement
is restricted mainly by link bandwidth capacity, we say this virtual network is Communication-
67
Intensive. Computation-Intensive virtual networks request more CPU cycles while Communication-
Intensive virtual networks request more link bandwidth for their placement.
The placement of Computation-Intensive virtual network is a bin packing problem since the
VM CPU utilization dominates the VM placement. This problem can be resolved by Algorithm
First Fit Decrease (FFD). By Algorithm FFD, the approximation of PM number is 1.22 see [82].
We assume in the worst case all switches are active, for a K port switch fat-tree network, the total
switches is 5K
2
4
, and total PMs is K
3
4
. We assume by the optimal solution no switch needs to be
active, so the approximation of Algorithm FFD for VM placement is 1.22(1 + 5
K
). If K = 48, the
approximation is 1.35.
Now we consider the more complex case Communication-Intensive VM placement. In this
case, our problem TQVMP becomes Quadratic Assignment Problem (QAP), which is a known
NP-complete problem. In fact, it is one of the most difficult problems in NP-hard class, and it even
cannot be approximated efficiently within some constant approximation ratio. The best option is
to host heavy communicated network-aware VMs on PMs that located close to each other.
In this study, we use traffic matrix M to identify traffic patterns with heavy communications,
and consolidate VMs into a minimum PMs and minimize the switch number, and then turn off
unused PMs and switches for power savings. We divide our solution into two phases: VM grouping
and PM mapping.
5.3.1 Traffic-Aware VM Packing
In the first phase, we cut graph Gv into subgraphs and pack VMs in each subgraph into a VM-
cluster. A VM-cluster is a subset T ′ of VMs in T with high intra-group traffics, (T ′ ⊂ T ). T − T ′
is the subset of VMs outside the cluster in T . We let inter-cluster traffic of T ′ to be traffics
between T ′ and T − T ′, which is ∑i∈T ′,j∈T−T ′mi,j . Each VM-cluster will be hosted in a single
PM. Therefore we can minimize the number of PMs by minimizing the number of VM-clusters
68
under the constraints Cs and Cl, see equation (5.5).
Minimize
∑
p∈P
yp (5.5a)
Subject to
∑
i∈T
∆i,pri ≤ Cs ∗ yp, (5.5b)
∑
p∈P
∆i,p = 1, (5.5c)
∑
i,j∈T
mi,j ∗ (∆i,p(1−∆j,p)+
∆j,p(1−∆i,p)) ≤ Cl. (5.5d)
This falls into the class of Multi-Capacity Bin Packing Problem [83], which is a known NP-
complete problem. Unlike bin packing, in traffic-aware VM packing, the cumulative bandwidth
of a cluster of VMs hosted in a server can be smaller than the sum of individual VM bandwidth,
as shown in Constraint (5.5d). This is because the communications between co-located VMs are
through the shared memory and do not require any communication bandwidth from the server
adapter.
For communication-intensive virtual network, it is the link bandwidth resource that is the bot-
tleneck to consolidate VMs into PMs. Therefore, to minimize the number of PMs, we must mini-
mize the inter VM-cluster traffics.
We propose Algorithm VM-Packing (see Algorithm 1) to identify VM-clusters by traffic matrix
M while minimizing inter VM-cluster traffics. Algorithm VM-Packing cuts Gv into VM-clusters,
which can be hold by PMs under the constraints of Cs and Cl. When VM-clusters are mapped into
PMs, all inter VM-cluster traffics are carried out by data center network while intra VM-cluster
traffics are ignored.
There are two steps in Algorithm VM-Packing, the first step (Line 3-21) is to identify traffic
patterns by traffic matrix M , the second step (Line 22-37) is to pack the traffic patterns into VM-
clusters which can be hold by PMs under the constraints of Cs and Cl. In the first step, we have
69
two operations : merge and partition. In the merge operation (Line 4-8), we sort the flowsmi,j with
VMs i, j ∈ T by their size in non-increasing order. We get the largest flow mu,v = maxi,j∈T mi,j
and two VMs u and v adjacent to it, if ru + rv ≤ Cs and
∑
i∈T mu,i +
∑
i∈T mv,i − 2mu,v ≤ Cl,
we merge VMs u and v into a new super VM, and then recompute the traffic matrix M , this newly
created super VM is a traffic pattern. We continue this operation until no VM pair can be merged.
Each time we recompute traffic matrix M , and update the virtual network graph Gv.
After merge operation, we do partition operation (Line 9-20). We generate Maximum Spanning
Tree (MST) for the merged graph Gv, MMST is the traffic matrix of MST . We sort the flows
mMSTi,j while VMs i, j ∈ T by their size in non-decreasing order. We remove the smallest edge
eu,v = mini,j∈T mMSTi,j , and partition MST into two subtrees MST1 and MST2 where T1 and
T2 are subsets of VMs in MST1 and MST2 respectively. For each subtree MSTk and Tk, if∑
i∈Tk ri ≤ Cs and inter-pattern traffic of Tk
∑
i∈Tk,j∈T−Tk mi,j ≤ Cl, we merge VM set Tk into
a new super VM; otherwise, partition MSTk by removing eu,v = mini∈Tk,j∈Tk m
MSTk
i,j until CPU
utilization of each partitioned pattern is no more than Cs and its inter-pattern traffics is no more
than Cl. Then we recompute traffic matrix M of the updated Gv. We repeat merge and partition
operations until no new super VM is created in the first step. After the first step, we move to the
second step to pack the VM patterns into VM-clusters using Packing algorithm First-Fit.
For example, we have a virtual network graph with 17 VMs, as shown in see Figure (5.2). Let’s
assume the computation capacity Cs = 12 and the link bandwidth capacity Cl = 24. We let ri = 1
for 1 ≤ i ≤ 12, ri = 3 for 13 ≤ i ≤ 17, and
∑
j∈T mi,j ≤ Cl for 1 ≤ i ≤ 17. In the first round
merge operation, VMs 13 and 14 are merged, and VMs 15 and 16 are merged(see Figure (5.3)).
Then we generate a MST tree for the graph, (see Figure (5.4)), run the partition operation to get
three new super VMs, VM 1, 2, 3, VM 4, 5, 6, and VM 7, 8, 9, see (Figure (5.5)). We continue the
next round operations, and get two new super VMs, VM 1, 2, 3, 7, 8, 9, 11 and VM 4, 5, 6, 10 by the
merge operation, (see Figure (5.6)). Finally, we get a new super VM 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
by the partition operation, (see Figure (5.7)).
Theorem 3. The complexity of Algorithm VM-Packing is O(N3log2N) where N = |T |.
70
5 
8 1 
2 
3 
14 
17 
16 
15 
13 
7 
9 4 
6 
11 
10 
12 
6 
6 
1 1 
1 
3 
3 
3 
4 
3 
3 
3 
4 
4 
10 
10 
10 
10 
10 
10 
10 
10 
10 5 
7 
7 
5 
2 
4 
1 4 
Figure 5.2: Graph of a tenant virtual network with 17 nodes while Cs = 12 and Cl = 24
5 
8 1 
2 
3 
13,14 
17 
15,16 
7 
9 4 
6 
11 
10 
12 
6 
6 
1 1 
1 
3 
3 
3 
4 
3 
3 
3 
4 
4 
10 
10 
10 
10 
10 
10 
10 
10 
10 6 
7 
7 
7 
Figure 5.3: Merge VMs 13 and 14, VMs 15 and 16 respectively
5 
8 1 
2 
3 
13,14 
17 
15,16 
7 
9 4 
6 
11 
10 
12 
6 
6 
3 
3 
3 
10 
10 
10 
10 10 
10 6 
7 
7 
7 
Figure 5.4: Maxium Spanning Tree of the graph
1,2,3 
13,14 
17 
15,16 
11 
10 
12 
6 
6 
1 
1 
1 
3 
3 
3 
4 
3 
3 
3 
4 
4 
6 
7 
7 
7 
4,5, 6 
7,8, 9 
Figure 5.5: partition VMs 1, 2, 3, VMs 4, 5, 6, VMs 7, 8, 9, and then merge them into three super
VMs
71
Algorithm 2 VM-Packing
Input: Graph Gv with VMs of T as vertexes and traffic matrix M as edges
Output: VM-cluster set W = {W1,W2, ...,Wn}, and inter VM-clusters traffic matrix D
1: Set Wi = φ ,i = 1, ..., n
2: Set D = 0
3: repeat
4: sort mi,j with i, j ∈ T in non-increasing order //Start operation merge
5: while existing VM pairs can be merged under the constraints Cl and Cs do
6: merge VMs u and v with mu,v = maxi,j∈T mi,j into a super VM
7: sort mi,j with i, j ∈ T in non-increasing order
8: end while// End operation merge
9: generate Maximum Spanning Tree MST for the graph Gv//Start operation partition
10: enqueue MST to queue Q
11: while Q is not empty do
12: dequeue the first element MST ′ from Q where T ′ is the VM set of MST ′
13: if
∑
i∈T ′ ri ≤ Cs and inter-pattern traffic requirement of T ′
∑
i∈T ′.j∈T−T ′mi,j ≤ Cl then
14: merge VMs in T ′ into a super VM
15: else
16: partition MST ′ by removing the edge with minimum weight in MST ′
17: put all partitioned trees in to Q
18: end if
19: end while
20: recompute the traffic matrix M by putting super VMs back into Gv and recovering the
removed edges//End operation partition
21: until no super VM is created
{Packing VMs by Algorithm FF in the following statements}
22: sort VMs in T by their CPU utilization // T is updated by merge and partition operations
23: for each VM i with i ∈ T do
24: for each Bin j do
25: if Bin j can hold VM i under the constraints Cs and Cl then
26: Wj ⇐ Wj∪ VM i // Wj is subset of VMs in Bin j
27: end if
28: end for
29: if no Bin can hold VM i then
30: initialize an empty bin n, Wn ⇐ VM i
31: end if
32: end for
33: compute traffic matrix D among bins
72
1,2,3,7,8,9,11 
13,14 
17 
15,16 
6 
6 
7 
7 
9 
6 
7 
7 
7 
4,5, 6,10 
12 
Figure 5.6: Second round merging
13,14 
17 
15,16 
6 
6 
7 
7 
7 
1,2,3,4,5,6,7,
8,9,10,11,12 
6 
Figure 5.7: Second round partitioning
Proof. In the first step of Algorithm VM-Packing, two operations merge and partition are executed
in a loop. Operation merge is also a loop, edges are sorted in non-increasing order, the time cost is
O(N2 logN), then the largest edge is merged, and traffic matrix M is updated, the time cost is N .
The merge loop is is executed at most N times, so the time cost for operation merge is N3 logN .
For partition operation, the time cost of generating MST is O(N2 logN), loop of removing an
edge in MST is N times. For each removing, checking subtree is time cost N2, so the time cost of
partition operation is N3.
The loop of the first step of Algorithm VM-Packing is executed in logN times. This can be
proved in the follow way. We assume the generated MST of graph Gv is composed of such edges:
er,s ≥ es,t ≥ et,u ≥ eu,v ≥ ev,w ≤ ew,x ≤ ex,y where VM r is root of MST and edge ev,w is the
smallest. In partition operation, we partition MST tree by removing the smallest edge ev,w, then
MST is divided into two parts MSTi and MSTj where MSTj is composed of [ew,x, ex,y]. We
assume VMs in MSTj satisfy constraints of Cs and Cl. Edge number in MSTj is either 0 or at
least 2 because if one edge in MSTj , this edge must been merged in the merge operation. After
73
this partition, since the edge number is at least two, the newly created super VM w∗ must contains
over three VMs. The MST is changed into er,s ≥ es,t ≥ et,u ≥ eu,v ≥ ev,w∗ .
In the next round merge operation, if no VM is merged, the MST dose not change, no new
super VM can be created, the loop stops. In order to continue the loop, at least two new super
VMs v∗ and w∗ must created by merge operation to update MST. We assume the updated MST
is er,s, es,t, et,u, eu,v∗ , ev∗,w∗ . In this updated MST, if we have er,s ≥ es,t ≥ et,u ≤ eu,v∗ ≤ ev∗,w∗ ,
the partition operation can partition the updated MST and create new super VM. For new created
super VMs v∗ and w∗, each of them are merged by VMs at least one of which must be new super
VM created in previous round.
For the n − th round of merge and partition loop, in order to continue the loop, at least two
new super VMs must be created by merging at least four VMs and two of them must be super VMs
created in previous round, so the VM number is reduced at least 2n in the n − th round of loop.
Therefore, we derive loop is executed in logN times.
The time cost of Algorithm FF is no more than N3, so the complexity of Algorithm VM-
Packing is O(N3log2N).
5.3.2 Virtual Network Mapping
After VM packing, we obtain a virtual network G(W,D) with a set of VM-clusters W and inter
VM-cluster traffic matrix D. In the second phase and third phase, we minimize the active switches
in the data center network which carry out the inter VM-cluster traffics under bandwidth guarantee.
In the second phase, we map the virtual network into data center networks while minimizing
the number of active switches. If VM-cluster p is mapped to a PM in RACKs , then Γsp is equal to
1, otherwise, it is equal to 0. If VM i is packed into VM-cluster p, ∆ip is equal to 1, otherwise it is
equal to 0. If Γsp∆
i
p = 1, VM i is placed into a PM in RACKs. In a DCN with fat-tree topology,
for flow mi,j between VMs i and j, if one of VMs i and j is packed into VM-cluster p, the other is
74
not, flow mi,j crosses edge switch s, so we obtain:
Ωsmi,j = Γ
s
p(∆
i
p(1−∆jp) + ∆jp(1−∆ip)) (5.6)
where ∆ip,∆
j
p indicates whether VM i, j is in VM-cluster p respectively.
The problem of minimizing active switches in the data center network under bandwidth guar-
antee in the second and third phases can be formulated as :
Minimize
∑
s∈S
xs (5.7a)
Subject to Ωsmi,j = xsΓ
s
p(∆
i
p(1−∆jp) + ∆jp(1−∆ip))
(p ∈ W, s ∈ EDGEk, 1 ≤ k ≤ K
2
), (5.7b)∑
i,j∈T
eu,vΩ
u
mi,j
Ωvmi,jmi,j ≤ xuxvCl,
(u ∈ S, v ∈ S), (5.7c)∑
s∈CORE
Ωsmi,j =
1
2
K∑
k=1
k, (5.7d)
∑
s∈AGGRk
Ωsmi,j = (
1
2
(
∑
m∈EDGEk
∑
p∈RACKm
∆i,p)
∗ (
∑
m∈EDGEk
∑
p∈RACKm
∆j,p) + k)
∑
m∈EDGEk
δm,
(1 ≤ k ≤ K). (5.7e)
The objective of the second phase is to find a VM-cluster mapping from W to PMs in data
center to minimize inter-rack traffic and inter-pod traffic. This is QAP problem which is NP-hard.
To minimize inter-rack traffic and inter-pod traffic crossing in data center network, our solution
is to map VM-clusters with heavy traffics to PMs located as close as possible in data center so
that the flows among them cross minimum switches. Since the data center network is a three-tier
75
fat-tree network, the solution to VM-cluster mapping is divided into two levels. In the first level,
VM-clusters in W are mapped to PMs in rack level. The VM-cluster set W and the inter VM-
cluster traffic matrix D can be modeled as a graph. The first level mapping is to cut the graph
into d2|W |
K
e parts termed as rack-level VM-cluster groups W+ while minimizing inter-part traffics
termed as inter rack-level VM-cluster group traffics D+. Each rack-level VM-cluster group has
no more than K
2
VM-clusters. The rack-level VM-cluster group size is equal to the rack size. All
VM-clusters in a rack-level VM-cluster group are mapped to PMs located in the same rack. This
is a Balanced Minimum K-cut Problem (BMKP). After first level mapping, we obtain a set of
rack-level VM-cluster groups W+ and inter rack-level VM-cluster group traffic matrix D+ which
can be modeled as a graph as well, and this graph can be further cut into d2|W+|
K
e parts termed
as pod-level VM-cluster groups W ∗ while minimizing inter-part traffics termed as inter pod-level
VM-cluster group traffics D∗. All rack-level VM-cluster groups in a pod-level VM-cluster group
are mapped to racks located in the same pod. Because fat-tree is architecture of full bisection
bandwidth connection, there is no communication congestion in data center network in this phase.
Graph cuts in both levels mapping are similar.
We propose Algorithm VN-Mapping to resolve the BMKP problem described above. We sort
flow dp,q while p, q ∈ W in non-increasing order, we get two VM-clusters or VM-cluster set u and
v with du,v = maxp,q∈W dp,q into a new VM-cluster set. We denote Wˆk to be the set of VM-clusters
mapped to k−th rack. If VM-clusters u and v are not mapped into any rack, then map them to rack
m which has the largest flow with them (
∑
j∈Wˆm dv,j + du,j = max1≤n≤d 2|W |
K
e
∑
j∈Wn, du,j + dv,j).
If VM-clusters u or v is already mapped to some rack, the other VM-cluster is mapped to the same
rack. If rack can not hold the VM-cluster pair, we continue on a VM-cluster pair with the next
largest flows. We repeat the combining operations until all VM-clusters are mapped to racks.
After the first level VN mapping, we obtain a set of rack-level VM-cluster groups W+ and
inter-rack traffic matrix D+. In the second level, we map the rack-level VM-cluster groups to pods
while minimizing the number of active core switches. This is similar to the fist level mapping and
can be solved by the same VN-Mapping algorithm.
76
Algorithm 3 VN-Mapping
Input: VM-cluster set W , and Traffic Matrix of VM-clusters D
Output: Rack-level VM-cluster group set W+ = {Wˆ1, ..., Wˆn}, and Traffic Matrix of rack-level
VM-cluster groups D+
1: calculate needed rack number n = d2|W |
K
e
2: sort dp,q while p, q ∈ W in non-increasing order
3: while existing VM-clusters not mapped to any rack do
4: find VM-cluster or VM-cluster set pair u and v with the largest traffic between them du,v =
maxp,q∈W dp,q
5: if both VM-clusters u and v are not mapped into any rack then
6: map the VM-clusters u, v into rack m whose VM-cluster set Wˆn has the largest traffic∑
j∈Wˆm du,j + dv,j = max1≤n≤d 2|W |
K
e
∑
j∈Wn du,j + dv,j to them, Wˆm ⇐ Wˆm∪ VM-
cluster u ∪ VM-cluster v
7: else if one of VM-clusters u and v is already mapped into rack then
8: map the other VM-cluster to the same rack
9: end if
10: Recompute the traffic matrix D
11: Sort dp,q in non-increasing order
12: end while
13: compute D+ by {Wˆ1, ..., Wˆn}
5.3.3 VM-Pair Flow Routing
After VMs are placed onto rack PMs, it is necessary to choose path for the flows of VM pairs
routing in the data center network while minimizing active switches.
Four cases have been discussed in flow routing in the MODEL sector. Case 1 is ignored. A
flow of case 2 crosses only one edge switch, so all edge switches which are connected to PMs
placed with VMs must be active, the minimum active edge switches is e2W
K
d. Flows of case 3 and
4 must cross aggregated switches and core switches in data center network, switches in aggregated
and core tiers are needed to create paths for these flows.
Since fat-tree network is multi-root topology with network redundancy, not all switches in
aggregated tier and core tier are required to work. Network traffic must be moved and aggregated
onto a fewer number of paths so that the remaining network elements are put into dormant state
for energy conservation.
Selecting routes for the flows of VM pairs placed in data center while minimizing active
77
switches is Multi-Commodity Flow Problem [73] which is NP-hard.
Heller et al [73] proposed a Greedy Bin-Packing algorithm for flow routing. For each flow,
the greedy bin-packer evaluates possible paths and chooses the leftmost one in fat-tree (see Figure
(5.1)) with sufficient capacity. However, for some traffic matrices, the greedy approach will not find
a satisfying assignment for all flows; this is an inherent problem with any greedy flow assignment
strategy, even when the network is provisioned for full bisection bandwidth.
We proposed a heuristic algorithm termed Flow-Routing to assign paths for VM-pair flows
in a three-tier fat-tree network, Flow-Routing can ensure bandwidth guarantee for all flows while
minimizing active switches.
Like Greedy Bin-Packing algorithm proposed by Heller, for each flow, Flow-Routing evalu-
ates possible paths and chooses the leftmost one with sufficient capacity, but under bandwidth
guarantee.
We assign index (1, ..., K
2
) to aggregated switches from left to right respectively in each pod.
The i− th aggregated switches of all pods are connected to K
2
core switches. So the core switches
are divided into K
2
groups, we assign index to the groups from the left to right, each group index
corresponds to the index of the aggregated switches it is connected to.
Given a set of VMs T along with traffic matrix M , we let M also to represent the set of VM-
pair flows because each element mi,j while i, j ∈ T is a VM-pair flow. We assume flows mi,j and
mj,i are the same. We denote MC is the set of flows of case 4, and denote MAi as set of flows of
case 3 in pod i while 1 ≤ k ≤ K.
Flow-Routing chooses active switches in two steps, Flow-Routing first chooses core switches,
and then chooses aggregated switches in each pod.
In the first step. Flow-Routing sorts flows in MC in non-increasing order. For each core switch
from the left to the right in data center network, Flow-Routing selects the largest flow from MC ,
Flow-Routing evaluates residual link bandwidth of the switch, if the switch has sufficient band-
width for the flow, Flow-Routing allocates the link bandwidth of the switch to the flow and uses
this switch along with the connected aggregated switches to create a path for the flow; otherwise,
78
Flow-Routing continues next flow until the switch has no sufficient bandwidth for any flow. If ex-
isting some flows do not have allocated bandwidth, Flow-Routing selects the second leftmost core
switch to allocate bandwidth for the flows as it does on the previous one. This allocation continues
until all flows are allocated bandwidth.
Algorithm 4 Flow-Routing
Input: VM set T which have been already mapped into PMs on racks, and Traffic Matrix of VM
M
Output: Active core switches and aggregated switches
1: select all flows mi,j where VM i and j are placed into different pods into set MC .
2: sort flows in MC in non increasing order.
3: for each core switch i in CORE do
4: for I=j to |MC | do
5: select the largest flow mp,q = maxMC
6: if core switch i has enough link bandwidth for mp,q then
7: allocate link bandwidth of core switch i to mp,q
8: create path using core switch i and aggregated switches in the same pods with VM i j
which core switch i is connected to
9: end if
10: end for
11: if All flows are allocated with link bandwidth then
12: exit for
13: end if
14: end for
In the second step, Flow-Routing does this work in each pod on flows of MAi while 1 ≤ i ≤ K
as it does on flows in MC . Since fat-tree network is in full bisection connection, the bandwidths
allocating to the flows are ensured and no congestion occurs in the network. The time complexity
of Algorithm Flow-Routing is O(N2 logN) where N is the VM set size |T |.
5.4 Performance Evaluation
We evaluate the performance of Algorithm VM-Packing and Algorithm VN-Mapping by simula-
tions. The traffic matrix is generated using log-normal distribution with density function shown
in Equation (5.8). In all tests of log-normal distribution, the mean µ is set to 0 while the standard
79
deviation σ is set to 1 or 1.5. Increasing σ, we can create a more scattered and higher total traffic
between VMs.
f(x;µ, σ) =
1
xσ
√
2pi
e−
(lnx−µ)2
2σ2 (5.8)
We test three cases of edge weight distributions, two log-normal distribution where σ is set
to 1 and 1.5, and one standard normal distribution where edge weight varies between [1, 2, 3, 4, 5]
randomly.
5.4.1 VM Packing
Most existing VM placement solutions ignore the VM packing. They simply assume that each VM
has the same size, and each PM supports only one VM [13] or a fixed number of VMs [59, 61].
Algorithm VMP [60] is the only other solution that considers the VM packing together with the
VM placement. So, we compare Algorithm VM-Packing with Algorithm VMP. Both algorithms
identify patterns by traffic matrix. However, Algorithm VMP does not have the merge step as
Algorithm VM-Packing does.
We let the PM computation capacity Cs to be 200, and link bandwidth capacity Cl to be 100.
In the graph of a tenant virtual network, the degree of a node varies from 0 to |T | − 1. We assume
for VM i while i ∈ T . We have ri ≤ Cs and
∑
j∈T mi,j ≤ Cl; otherwise the tenant’s request for
the virtual network may be denied.
Both algorithms output a set of VM-clusters W and a inter VM-cluster traffic matrix D. Since
each VM-cluster is hosted by one PM, the number of PMs needed equals to |W | and the inter-PM
traffic matrix is the same as that of VM-clusters. We let W ′ be the set of PMs which host VM-
clusters and D′ be the inter-PM traffic matrix resulted from Algorithm VMP. We let W be the set
of PMs and D be the inter-PM traffic matrix resulted from Algorithm VM-Packing. We define the
PM ratio ηs to be
|W ′|
|W | and the inter-PM traffic ratio ηt to be
∑
i,j∈W ′ d
′
i,j∑
i,j∈W di,j
.
First, we simulate the number of VMs of a tenant virtual network. Figure (5.8) and (5.9) show
80
0 100 200 300 400 500 600
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Number of virtual machine
η s
 
R
at
io
 o
f P
M
 n
um
be
r
 
 
Normal distribution
LogNormal distribution σ =1.5
LogNormal distribution σ =1
Figure 5.8: The PM ratio when the number of VMs ranged from 50 to 600 stepped by 25, and the
mean degree of VMs was 8.
0 100 200 300 400 500 600
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Number of virtual machine
η t
 
R
at
io
 o
f i
nt
er
−P
M
 tr
af
fic
 lo
ad
 
 
Normal distribution
LogNormal distribution σ =1.5
LogNormal distribution σ =1
Figure 5.9: The inter-PM traffic ratio when the number of VMs ranged from 50 to 600 stepped by
25, and the mean degree of VMs was 8.
4 5 6 7 8 9 10 11 12 13 14
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Mean degree of node
η s
 
R
at
io
 o
f P
M
 n
um
be
r
 
 
Normal distribution
LogNormal distribution σ =1.5
LogNormal distribution σ =1
Figure 5.10: The PM ratio when the mean degree of VMs ranged from 4 to 14 stepped by 1, and
the number of VMs was 200.
81
4 5 6 7 8 9 10 11 12 13 14
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Mean degree of node
η t
 
R
at
io
 o
f i
nt
er
−P
M
 tr
af
fic
 lo
ad
 
 
Normal distribution
LogNormal distribution σ =1.5
LogNormal distribution σ =1
Figure 5.11: The inter-PM traffic ratio when the mean degree of VMs ranged from 4 to 14 stepped
by 1, and the number of VMs was 200.
0 100 200 300 400 500 600 700
0
5
10
15
20
25
30
35
Number of virtual machines
N
um
be
r o
f p
hy
sic
al
 m
ac
hi
ne
s
 
 
Algorithm VMP
Algorithm VM−Grouping
Figure 5.12: The number of PMs when the number of VMs ranged from 50 to 600 stepped by 25,
and the mean degree of VMs was 8 in the log-normal distribution with σ = 1.5.
0 100 200 300 400 500 600 700
0
200
400
600
800
1000
1200
1400
1600
Number of virtual machines
In
te
r−
PM
 tr
af
fic
 lo
ad
 
 
Algorithm VMP
Algorithm VM−Grouping
Figure 5.13: The total inter-PM traffic volumes when the number of VMs ranged from 50 to 600
stepped by 25, and the mean degree of VMs was 8 in the log-normal distribution with σ = 1.5.
82
4 5 6 7 8 9 10 11 12 13 14
0
5
10
15
20
25
30
Mean value of node degree
N
um
be
r o
f p
hy
sic
al
 m
ac
hi
ne
s
 
 
Algorithm VMP
Algorithm VM−Grouping
Figure 5.14: The number of PMs when the mean degree of VMs ranged from 4 to 14 stepped by 1
in the log-normal distribution with σ = 1.5, and the number of VMs was 200.
4 5 6 7 8 9 10 11 12 13 14
0
500
1000
1500
Mean value of node degree
In
te
r−
PM
 tr
af
fic
 lo
ad
 
 
Algorithm VMP
Algorithm VM−Grouping
Figure 5.15: The number of PMs when the mean degree of VMs ranged from 4 to 14 stepped by 1
in the log-normal distribution with σ = 1.5, and the number of VMs was 200.
10 20 30 40 50 60 70 80 90 100
0.5
1
1.5
2
2.5
3
3.5
4
Number of physical machines
R
at
io
 o
f n
et
wo
rk
 c
ut
 tr
af
fic
s
PM Mapping cut
 
 
Normal distribution
LogNormal distribution σ =1.5
LogNormal distribution σ =1
Figure 5.16: Simulation Results when data center network is fat-tree with K = 16, cut traffic of
Algorithm VN-Mapping to Random
83
80 100 120 140 160 180 200
0.96
0.965
0.97
0.975
0.98
0.985
0.99
0.995
1
1.005
Number of physical machines
R
at
io
 o
f n
et
wo
rk
 c
ut
 tr
af
fic
s
PM Mapping cut order
 
 
Normal distribution
LogNormal distribution σ =1.5
LogNormal distribution σ =1
Figure 5.17: Simulation Results when data center network is fat-tree with K = 16, cut traffic of
Algorithm VN-Mapping to Reverse cut
the PM ratio and inter-PM traffic ratio vary with the number of VMs which ranges from 50 to 600
with step size 25. The mean degree of VMs is 8. Three curves represent three edge distributions.
From the figures, we observe the PM ratios ηs and inter-PM traffic ratios ηt are no less than 1
which demonstrates that Algorithm VM-Packing partitions the set of VMs into fewer PMs with
fewer inter-PM traffics than Algorithm VMP does in all three cases. Compared to Algorithm
VMP, Algorithm VM-Packing reduces the number of PMs by up to 20% when the number of VMs
is less than 300 and by 20% to 40% when the number of VMs is more than 300 for the log-normal
distribution with σ = 1.5. Further, it reduces the inter-PM traffics by 10% to 20% when the number
of VMs is less than 300 and by 30% to 45% when the number of VMs is more than 300 for the
log-normal distribution with σ = 1.5.
Algorithm VM-Packing is more efficient in log-normal distributions with σ = 1.5 than σ = 1,
and more efficient in log-normal distributions than the normal distribution when the number of
VMs is more than 300. From Figure (5.8) and (5.9), we also observe that when the number of
VMs is small, the performances of both algorithms are almost the same in the log-normal distri-
bution with σ = 1. Occasionally, Algorithm VMP is more efficient than Algorithm VM-Packing,
because Algorithm VM-Packing is more suitable in Communication-Intensive cases where the
traffic patterns between VMs are more distinct.
Second, we simulate the mean degree of VMs (nodes). Figure (5.10) and (5.11) show the PM
ratio and inter-PM traffic ratio vary with the mean degree of nodes which ranges from 4 to 14 step
84
by 1, when the number of VMs is 200. The curves show that Algorithm VM-Packing outperforms
Algorithm VMP by 11% to 43% in term of inter-PM traffics. Algorithm VM-Packing performs
similar to Algorithm VMP when the mean degree of nodes is less than 8 since the traffic is low.
When the node degree increases, Algorithm VM-Packing becomes more efficient than Algorithm
VMP.
Figure (5.12) to (5.15) plot test data output by two algorithms in log-normal distribution with
σ = 1.5. Figure (5.12) and (5.13) show the number of PMs and the inter-PM traffics vary with the
number of VMs, respectively. Figure (5.14) and (5.15) show the number of PMs and the inter-PM
traffics vary with the mean degree of VMs, respectively.
5.4.2 Virtual Network Mapping
After the VM packing, we obtain a consolidated virtual networkG(W,D) with a set of VM-clusters
W and inter VM-cluster traffic matrix D. Algorithm VN-Mapping maps the virtual network into
data center networks while minimizing the number of active switches. We compared Algorithm
VN-Mapping with a random algorithm where a VM-cluster has equal probability to be mapped to
any PM.
The data center network is a three-tier homogeneous fat-tree topology with K = 16. Algo-
rithm VN-Mapping must run two times to implement the VN mapping. We denote W+Random and
W ∗Random the rack-level VM-cluster set and the pod-level VM-cluster set resulted from the random
algorithm, respectively. We denote W+ and W ∗ the rack-level VM-cluster set and the pod-level
VM-cluster set resulted from Algorithm VN-Mapping, respectively. Flows d+i,j and d
∗
i,j represent
the traffics carried out in the aggregation tier and the core tier, respectively. We define the ratio of
total traffics ηm as
∑
i,j∈W+ d
+
i,j+
∑
i,j∈W∗ d
∗
i,j∑
i,j∈W+
Random
d+i,j+
∑
i,j∈W∗
Random
d∗i,j
where
∑
i,j∈W+Random d
+
i,j and
∑
i,j∈W ∗Random d
∗
i,j
are the sum of inter-rack traffics and the sum of inter-pod traffics obtained by the random algorithm
respectively, and
∑
i,j∈W+ d
+
i,j and
∑
i,j∈W ∗ d
∗
i,j are the sum of inter-rack traffics and the sum of
inter-pod traffics obtained by Algorithm VN-Mapping respectively.
We test three distributions of edge weights. As shown in Figure (5.16), ηm varies with the
85
number of PMs and is always greater than 1. Algorithm VN-Mapping outperforms the random
algorithm in all cases. It is much more efficient in log-normal distribution with σ = 1.5 where
traffic patterns are the most distinct and identifiable. When the number of PMs is bigger than 40,
ηm of log-normal σ = 1.5 is close to 1.5, ηm of log-normal σ = 1 is close to 1.15, and ηm of
normal distribution is close to 1.
Algorithm VN-Mapping must run two times to implement the virtual network mapping, but
two runs could be in different orders. In the normal order, it cuts the VM-cluster set into the rack-
level VM-cluster groups first, and then cuts the rack-level groups into the pod-level VM-cluster
groups. In the reverse order, it cuts the VM-cluster set into the pod-level VM-cluster groups first,
and then cuts each pod-level VM-cluster group into the rack-level VM-cluster groups. As shown
in Figure (5.17), the normal order performs better than the reverse order.
5.5 Summary
This chapter explores virtual network resource allocation problem in multi-tenant data centers.
This chapter proposes a solution to map a virtual network into a data center network while minimiz-
ing the total numbers of servers and switches required under the constraints of the link bandwidth
capacity and server computation capacity. The solution consists of three phases: traffic-aware vir-
tual machine (VM) packing, network-aware virtual network (VN) mapping, and power-aware flow
routing. VM Packing minimizes the number of physical machines (servers) required to host all
the VMs. VN Mapping assigns VM-clusters to physical servers in a way that VM pairs with more
traffic are placed closer to minimize network traffic. Flow routing consolidates flows to a smaller
number of links/switches, and thus leaves more idle links and switches to be powered off. Algo-
rithms in all three phase work under the bandwidth guarantees for flows. The experimental results
show our solutions significantly outperform the existing VM placement solutions.
86
CHAPTER 6
Conclusions
Cloud data centers are becoming increasingly popular for providing computing and storage re-
sources. However, the energy consumption of these data centers has skyrocketed and is becoming
a heavy burden to the cloud providers. The energy consumption of cloud computing data centers
comes mainly from three sources: servers, network switches, and cooling system. There are some
energy-efficient techniques used to reduce energy consumption in data centers such as virtual-
ization, Dynamic Voltage/Frequency Scaling (DVFS), and Dynamic Power Management (DPM).
Virtualization can consolidate the workload of a data center into the minimum number of servers
and switches, and then turn off unneeded servers and switches to save power. DVFS can reduce
the power consumption of a server by adjusting the working speed of the server. DPM can switch
server power modes between the sleep mode and the active mode to reduce the power consump-
tion. However, reducing power consumption of data centers may affect data center performance.
For example, the response time of a job may be extended so long that it misses its deadline, or
the bandwidth of a flow is not satisfied that affects the application performance. Therefore, the
performance of applications must be taken into account while minimizing the energy consumption
of data centers.
This thesis analyzed server and network energy consumption and proposed solutions to im-
prove energy efficiency of cloud data centers under the performance constraint.
First, this thesis introduces a smart power saving scheme, PowerSleep, which aims at power
saving for a single server. PowerSleep can minimize power consumption of a single server under
87
the mean job response time constraint. PowerSleep adjusts the server working frequency during
running time by DVFS, and puts the server into the sleep mode once the queue is empty, and acti-
vates the server to work once a new job arrives. To overcome the transition overhead, PowerSleep
adds procrastination sleep period when a new job arrives while the severs still in the sleep mode.
The server keeps sleeping during the procrastination sleep period to collect more jobs into the
queue. After the procrastination sleeping time, the sever wakes up to process jobs. This approach
reduces the mode transition overhead, but it increases the job response time.
Second, the thesis investigates energy-efficient real-time task scheduling in multi-core proces-
sors with voltage island model. This thesis introduces an energy-efficient algorithm Voltage Island
Largest Capacity First (VILCF) to schedule real-time tasks into cores. VILCF fully utilizes the
remaining capacity of cores without increasing power consumption of a chip. This thesis presents
detailed theoretical analysis of the approximation ratio of the proposed VILCF algorithm in terms
of energy efficiency.
Finally, the thesis explores the energy-efficient resource allocation problem for tenants in cloud
data centers while under bandwidth guarantees. The solution consists of three phases:VM packing,
VN mapping, and Flow routing. It is a complex problem, even each phase is an NP-hard problem.
The first two phases VM packing and VN mapping are a graph partitioned problem. The goal is to
put VMs with heavy traffics as close as possible. The third phase Flow routing aggregates VM-pair
flows to the minimum number of switches. The proposed solution enhances resource utilization in
data centers and thus reduce the energy consumption.
In conclusion, substantial contributions of this thesis in optimizing the energy consumption of
servers and networks in data centers will enable cloud providers to offer scalable services with
lower energy consumption, costs, and CO2 emissions.
88
BIBLIOGRAPHY
[1] Amazon, E., “Amazon elastic compute cloud (Amazon EC2),” Amazon Elastic Compute
Cloud (Amazon EC2), 2010.
[2] Thibodeau, P., “Data centers are the new polluters,” http://
www.computerworld.com/article/2598562/data-center/
data-centers-are-the-new-polluters.html, Accessed: 07-15-2015.
[3] Liu, J., Zhao, F., Liu, X., and He, W., “Challenges towards elastic power management in
internet data centers,” Distributed Computing Systems Workshops, 2009. ICDCS Workshops’
09. 29th IEEE International Conference on, IEEE, 2009, pp. 65–72.
[4] Srikantaiah, S., Kansal, A., and Zhao, F., “Energy aware consolidation for cloud computing,”
Proceedings of the 2008 conference on Power aware computing and systems, Vol. 10, San
Diego, California, 2008.
[5] Lackey, D. E., Zuchowski, P. S., Bednar, T. R., Stout, D. W., Gould, S. W., and Cohn, J. M.,
“Managing power and performance for system-on-chip designs using voltage islands,” Com-
puter Aided Design, 2002. ICCAD 2002. IEEE/ACM International Conference on, IEEE,
2002, pp. 195–202.
[6] Buurma, J. and Cooke, L., “Low-power design using multiple VTH ASIC librarie,” SoC
central, 2004.
[7] Choi, K., Dynamic Voltage and Frequency Scaling for energy-efficient system design, Uni-
versity of Southern California, 2005.
[8] Teodorescu, R. and Torrellas, J., “Variation-aware application scheduling and power manage-
ment for chip multiprocessors,” ACM SIGARCH Computer Architecture News, Vol. 36, IEEE
Computer Society, 2008, pp. 363–374.
[9] Suh, J., Kang, D.-I., and Crago, S. P., “Dynamic power management of multiprocessor sys-
tems,” Parallel and Distributed Processing Symposium, International, Vol. 2, IEEE Computer
Society, 2002, pp. 0097b–0097b.
[10] Ballani, H., Costa, P., Karagiannis, T., and Rowstron, A., “Towards predictable datacenter
networks,” ACM SIGCOMM Computer Communication Review, Vol. 41, ACM, 2011, pp.
242–253.
89
[11] Lee, J., Turner, Y., Lee, M., Popa, L., Banerjee, S., Kang, J.-M., and Sharma, P., “Application-
driven bandwidth guarantees in datacenters,” Proceedings of the 2014 ACM conference on
SIGCOMM, ACM, 2014, pp. 467–478.
[12] Popa, L., Yalagandula, P., Banerjee, S., Mogul, J. C., Turner, Y., and Santos, J. R., “Elastic-
switch: practical work-conserving bandwidth guarantees for cloud computing,” ACM SIG-
COMM Computer Communication Review, Vol. 43, ACM, 2013, pp. 351–362.
[13] Guo, C., Lu, G., Wang, H. J., Yang, S., Kong, C., Sun, P., Wu, W., and Zhang, Y., “Secondnet:
a data center network virtualization architecture with bandwidth guarantees,” Proceedings of
the 6th International COnference, ACM, 2010, p. 15.
[14] Benson, T., Akella, A., Shaikh, A., and Sahu, S., “CloudNaaS: a cloud networking platform
for enterprise applications,” Proceedings of the 2nd ACM Symposium on Cloud Computing,
ACM, 2011, p. 8.
[15] Wang, S., Liu, J., Chen, J.-J., and Liu, X., “Powersleep: a smart power-saving scheme with
sleep for servers under response time constraint,” Emerging and Selected Topics in Circuits
and Systems, IEEE Journal on, Vol. 1, No. 3, 2011, pp. 289–298.
[16] Liu, J. and Guo, J., “Energy efficient scheduling of real-time tasks on multi-core processors
with voltage islands,” Future Generation Computer Systems, 2015.
[17] Wang, S., Munawar, W., Liu, J., Chen, J.-J., and Liu, X., “Power-saving design for server
farms with response time percentile guarantees,” Real-Time and Embedded Technology and
Applications Symposium (RTAS), 2012 IEEE 18th, IEEE, 2012, pp. 273–284.
[18] Wang, S., Chen, J.-J., Liu, J., and Liu, X., “Power saving design for servers under response
time constraint,” Real-Time Systems (ECRTS), 2010 22nd Euromicro Conference on, IEEE,
2010, pp. 123–132.
[19] Liu, J. and Guo, J., “Voltage Island Aware Energy Efficient Scheduling of Real-Time Tasks
on Multi-core Processors,” High Performance Computing and Communications, 2014 IEEE
6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded
Software and Syst (HPCC, CSS, ICESS), 2014 IEEE Intl Conf on, IEEE, 2014, pp. 645–652.
[20] Liu, J. and Guo, J., “Traffic and Energy Aware Virtual Network Resource Allocation in Data
Centers,” The 7th IEEE International Conference on Cloud Computing Technology and Sci-
ence, IEEE, 2015.
[21] Von Kaenel, V., Macken, P., and Degrauwe, M. G., “A voltage reduction technique for
battery-operated systems,” Solid-State Circuits, IEEE Journal of , Vol. 25, No. 5, 1990,
pp. 1136–1140.
[22] Kim, N. S., Austin, T., Baauw, D., Mudge, T., Flautner, K., Hu, J. S., Irwin, M. J., Kan-
demir, M., and Narayanan, V., “Leakage current: Moore’s law meets static power,” computer,
Vol. 36, No. 12, 2003, pp. 68–75.
90
[23] Kolpe, T., Zhai, A., and Sapatnekar, S. S., “Enabling improved power management in multi-
core processors through clustered DVFS,” Design, Automation & Test in Europe Conference
& Exhibition (DATE), 2011, IEEE, 2011, pp. 1–6.
[24] Raje, S. and Sarrafzadeh, M., “Variable voltage scheduling,” Proceedings of the 1995 inter-
national symposium on Low power design, ACM, 1995, pp. 9–14.
[25] Chang, J.-M. and Pedram, M., “Energy minimization using multiple supply voltages,” Very
Large Scale Integration (VLSI) Systems, IEEE Transactions on, Vol. 5, No. 4, 1997, pp. 436–
443.
[26] Wu, H., Liu, I.-M., Wong, M. D., and Wang, Y., “Post-placement voltage island generation
under performance requirement,” Computer-Aided Design, 2005. ICCAD-2005. IEEE/ACM
International Conference on, IEEE, 2005, pp. 309–316.
[27] Benini, L., Bogliolo, A., and De Micheli, G., “A survey of design techniques for system-level
dynamic power management,” Very Large Scale Integration (VLSI) Systems, IEEE Transac-
tions on, Vol. 8, No. 3, 2000, pp. 299–316.
[28] Brock, B. and Rajamani, K., “Dynamic power management for embedded systems,” Pro-
ceedings of the IEEE SOC Conference, 2003, pp. 1–25.
[29] Soteriou, V. and Peh, L.-S., “Dynamic power management for power optimization of inter-
connection networks using on/off links,” High Performance Interconnects, 2003. Proceed-
ings. 11th Symposium on, IEEE, 2003, pp. 15–20.
[30] Geer, D., “Chip makers turn to multicore processors,” Computer, Vol. 38, No. 5, 2005,
pp. 11–13.
[31] Krishnan, V. and Torrellas, J., “A chip-multiprocessor architecture with speculative multi-
threading,” Computers, IEEE Transactions on, Vol. 48, No. 9, 1999, pp. 866–880.
[32] Talpes, E. and Marculescu, D., “Toward a multiple clock/voltage island design style for
power-aware processors,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions
on, Vol. 13, No. 5, 2005, pp. 591–603.
[33] Hu, J., Shin, Y., Dhanwada, N., and Marculescu, R., “Architecting voltage islands in core-
based system-on-a-chip designs,” Proceedings of the 2004 international symposium on Low
power electronics and design, ACM, 2004, pp. 180–185.
[34] Lee, W.-P., Liu, H.-Y., and Chang, Y.-W., “Voltage island aware floorplanning for power
and timing optimization,” Proceedings of the 2006 IEEE/ACM international conference on
Computer-aided design, ACM, 2006, pp. 389–394.
[35] Yang, S., Wolf, W., Vijaykrishnan, N., and Xie, Y., “Reliability-aware soc voltage islands par-
tition and floorplan,” Emerging VLSI Technologies and Architectures, 2006. IEEE Computer
Society Annual Symposium on, IEEE, 2006, pp. 6–pp.
91
[36] Pillai, P. and Shin, K. G., “Real-time dynamic voltage scaling for low-power embedded oper-
ating systems,” ACM SIGOPS Operating Systems Review, Vol. 35, ACM, 2001, pp. 89–102.
[37] Aydin, H., Melhem, R., Mosse´, D., and Mejia-Alvarez, P., “Dynamic and aggressive schedul-
ing techniques for power-aware real-time systems,” Proceedings of the 22nd IEEE Real-Time
Systems Symposium (RTSS), IEEE, 2001, pp. 95–105.
[38] Chen, J.-J., Hsu, H.-R., and Kuo, T.-W., “Leakage-aware energy-efficient scheduling of real-
time tasks in multiprocessor systems,” Real-Time and Embedded Technology and Applica-
tions Symposium, 2006. Proceedings of the 12th IEEE, IEEE, 2006, pp. 408–417.
[39] Aydin, H. and Yang, Q., “Energy-aware partitioning for multiprocessor real-time systems,”
Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS),
IEEE, 2003, pp. 113–121.
[40] Mishra, R., Rastogi, N., Zhu, D., Mosse´, D., and Melhem, R., “Energy aware scheduling
for distributed real-time systems,” Proceedings of the International Parallel and Distributed
Processing Symposium (IPDPS), IEEE, 2003, pp. 21–21.
[41] AlEnawy, T. A. and Aydin, H., “Energy-aware task allocation for rate monotonic schedul-
ing,” Proceedings of the 11th IEEE Real Time and Embedded Technology and Applications
Symposium (RTAS), IEEE, 2005, pp. 213–223.
[42] Seo, E., Jeong, J., Park, S., and Lee, J., “Energy efficient scheduling of real-time tasks on mul-
ticore processors,” Parallel and Distributed Systems, IEEE Transactions on, Vol. 19, No. 11,
2008, pp. 1540–1552.
[43] Qi, X. and Zhu, D.-K., “Energy Efficient Block-Partitioned Multicore Processors for Parallel
Applications,” Journal of Computer Science and Technology, Vol. 26, No. 3, 2011, pp. 418–
433.
[44] Ozturk, O., Kandemir, M., and Chen, G., “Compiler-directed energy reduction using dynamic
voltage scaling and voltage islands for embedded systems,” Computers, IEEE Transactions
on, Vol. 62, No. 2, 2013, pp. 268–278.
[45] Pagani, S. and Chen, J.-J., “Energy efficiency analysis for the single frequency approxima-
tion (SFA) scheme,” ACM Transactions on Embedded Computing Systems (TECS), Vol. 13,
No. 5s, 2014, pp. 158.
[46] Devadas, V. and Aydin, H., “Coordinated power management of periodic real-time tasks on
chip multiprocessors,” Green Computing Conference, 2010 International, IEEE, 2010, pp.
61–72.
[47] Fanxin, K., Wang, Y., and Qingxu, D., “Energy-Efficient Scheduling of Real-Time Tasks on
Cluster-Based Multicores ,” EDAA, 2011.
[48] Sheshadri, V., Agrawal, V. D., and Agrawal, P., “Power-aware SoC test optimization through
dynamic voltage and frequency scaling,” Very Large Scale Integration (VLSI-SoC), 2013
IFIP/IEEE 21st International Conference on, IEEE, 2013, pp. 102–107.
92
[49] Bari, M. F., Boutaba, R., Esteves, R., Granville, L. Z., Podlesny, M., Rabbani, M. G., Zhang,
Q., and Zhani, M. F., “Data center network virtualization: A survey,” Communications Sur-
veys & Tutorials, IEEE, Vol. 15, No. 2, 2013, pp. 909–928.
[50] Stage, A. and Setzer, T., “Network-aware migration control and scheduling of differentiated
virtual machine workloads,” Proceedings of the 2009 ICSE Workshop on Software Engineer-
ing Challenges of Cloud Computing, IEEE Computer Society, 2009, pp. 9–14.
[51] Beloglazov, A. and Buyya, R., “Energy efficient resource management in virtualized cloud
data centers,” Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster,
Cloud and Grid Computing, IEEE Computer Society, 2010, pp. 826–831.
[52] Wu, C.-M., Chang, R.-S., and Chan, H.-Y., “A green energy-efficient scheduling algorithm
using the DVFS technique for cloud datacenters,” Future Generation Computer Systems,
Vol. 37, 2014, pp. 141–147.
[53] Wang, L., Zhang, F., Arjona Aroca, J., Vasilakos, A. V., Zheng, K., Hou, C., Li, D., and
Liu, Z., “GreenDCN: A general framework for achieving energy efficiency in data center
networks,” Selected Areas in Communications, IEEE Journal on, Vol. 32, No. 1, 2014, pp. 4–
15.
[54] Wang, M., Meng, X., and Zhang, L., “Consolidating virtual machines with dynamic band-
width demand in data centers,” INFOCOM, 2011 Proceedings IEEE, IEEE, 2011, pp. 71–75.
[55] Popa, L., Kumar, G., Chowdhury, M., Krishnamurthy, A., Ratnasamy, S., and Stoica, I.,
“FairCloud: sharing the network in cloud computing,” Proceedings of the ACM SIGCOMM
2012 conference on Applications, technologies, architectures, and protocols for computer
communication, ACM, 2012, pp. 187–198.
[56] Rodrigues, H., Santos, J. R., Turner, Y., Soares, P., and Guedes, D., “Gatekeeper: Supporting
Bandwidth Guarantees for Multi-tenant Datacenter Networks.” WIOV , 2011.
[57] Jiang, J. W., Lan, T., Ha, S., Chen, M., and Chiang, M., “Joint VM placement and routing
for data center traffic engineering,” INFOCOM, 2012 Proceedings IEEE, IEEE, 2012, pp.
2876–2880.
[58] Wang, S.-H., Huang, P. P.-W., Wen, C. H.-P., and Wang, L.-C., “EQVMP: Energy-efficient
and QoS-aware virtual machine placement for software defined datacenter networks,” Infor-
mation Networking (ICOIN), 2014 International Conference on, IEEE, 2014, pp. 220–225.
[59] Meng, X., Pappas, V., and Zhang, L., “Improving the scalability of data center networks with
traffic-aware virtual machine placement,” INFOCOM, 2010 Proceedings IEEE, IEEE, 2010,
pp. 1–9.
[60] Dias, D. S. and Costa, L. H. M., “Online traffic-aware virtual machine placement in data cen-
ter networks,” Global Information Infrastructure and Networking Symposium (GIIS), 2012,
IEEE, 2012, pp. 1–8.
93
[61] Fang, W., Liang, X., Li, S., Chiaraviglio, L., and Xiong, N., “VMPlanner: Optimizing virtual
machine placement and traffic flow routing to reduce network power costs in cloud data
centers,” Computer Networks, Vol. 57, No. 1, 2013, pp. 179–196.
[62] Li, X., Wu, J., Tang, S., and Lu, S., “Let’s stay together: Towards traffic aware virtual machine
placement in data centers,” INFOCOM, 2014 Proceedings IEEE, IEEE, 2014, pp. 1842–1850.
[63] Duffield, N. G., Goyal, P., Greenberg, A., Mishra, P., Ramakrishnan, K. K., and van der
Merive, J. E., “A flexible model for resource management in virtual private networks,” ACM
SIGCOMM Computer Communication Review, Vol. 29, ACM, 1999, pp. 95–108.
[64] Zhu, J., Li, D., Wu, J., Liu, H., Zhang, Y., and Zhang, J., “Towards bandwidth guarantee
in multi-tenancy cloud computing networks,” Network Protocols (ICNP), 2012 20th IEEE
International Conference on, IEEE, 2012, pp. 1–10.
[65] Ballani, H., Jang, K., Karagiannis, T., Kim, C., Gunawardena, D., and O’Shea, G., “Chatty
Tenants and the Cloud Network Sharing Problem.” NSDI, 2013, pp. 171–184.
[66] Xie, D., Ding, N., Hu, Y. C., and Kompella, R., “The only constant is change: incorporating
time-varying network reservations in data centers,” ACM SIGCOMM Computer Communica-
tion Review, Vol. 42, No. 4, 2012, pp. 199–210.
[67] Shen, M., Gao, L., Xu, K., and Zhu, L., “Achieving bandwidth guarantees in multi-tenant
cloud networks using a dual-hose model,” Performance Computing and Communications
Conference (IPCCC), 2014 IEEE International, IEEE, 2014, pp. 1–8.
[68] Greenberg, A., Hamilton, J. R., Jain, N., Kandula, S., Kim, C., Lahiri, P., Maltz, D. A., Patel,
P., and Sengupta, S., “VL2: a scalable and flexible data center network,” ACM SIGCOMM
Computer Communication Review, Vol. 39, ACM, 2009, pp. 51–62.
[69] Niranjan Mysore, R., Pamboris, A., Farrington, N., Huang, N., Miri, P., Radhakrishnan,
S., Subramanya, V., and Vahdat, A., “Portland: a scalable fault-tolerant layer 2 data center
network fabric,” ACM SIGCOMM Computer Communication Review, Vol. 39, ACM, 2009,
pp. 39–50.
[70] Al-Fares, M., Loukissas, A., and Vahdat, A., “A scalable, commodity data center network
architecture,” ACM SIGCOMM Computer Communication Review, Vol. 38, No. 4, 2008,
pp. 63–74.
[71] Guo, C., Wu, H., Tan, K., Shi, L., Zhang, Y., and Lu, S., “Dcell: a scalable and fault-
tolerant network structure for data centers,” ACM SIGCOMM Computer Communication Re-
view, Vol. 38, ACM, 2008, pp. 75–86.
[72] Guo, C., Lu, G., Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., and Lu, S., “BCube:
a high performance, server-centric network architecture for modular data centers,” ACM SIG-
COMM Computer Communication Review, Vol. 39, No. 4, 2009, pp. 63–74.
94
[73] Heller, B., Seetharaman, S., Mahadevan, P., Yiakoumis, Y., Sharma, P., Banerjee, S., and
McKeown, N., “ElasticTree: Saving Energy in Data Center Networks.” NSDI, Vol. 10, 2010,
pp. 249–264.
[74] Shang, Y., Li, D., and Xu, M., “Energy-aware routing in data center network,” Proceedings
of the first ACM SIGCOMM workshop on Green networking, ACM, 2010, pp. 1–8.
[75] Alonso, M., Coll, S., Martı´nez, J.-M., Santonja, V., Lo´pez, P., and Duato, J., “Dynamic power
saving in fat-tree interconnection networks using on/off links,” Parallel and Distributed Pro-
cessing Symposium, 2006. IPDPS 2006. 20th International, IEEE, 2006, pp. 8–pp.
[76] Kliazovich, D., Bouvry, P., and Khan, S. U., “DENS: data center energy-efficient network-
aware scheduling,” Cluster computing, Vol. 16, No. 1, 2013, pp. 65–75.
[77] Barroso, L. A. and Ho¨lzle, U., “The case for energy-proportional computing,” Computer, ,
No. 12, 2007, pp. 33–37.
[78] Levy, H. and Kleinrock, L., “A queue with starter and a queue with vacations: delay analysis
by decomposition,” Operations Research, Vol. 34, No. 3, 1986, pp. 426–436.
[79] Michael, R. G. and David, S. J., “Computers and intractability: a guide to the theory of
NP-completeness,” WH Freeman & Co., San Francisco, 1979.
[80] Chen, J.-J., Hsu, H.-R., Chuang, K.-H., Yang, C.-L., Pang, A.-C., and Kuo, T.-W., “Multipro-
cessor energy-efficient scheduling with task migration considerations,” Real-Time Systems,
2004. ECRTS 2004. Proceedings. 16th Euromicro Conference on, IEEE, 2004, pp. 101–108.
[81] Andreev, K. and Racke, H., “Balanced graph partitioning,” Theory of Computing Systems,
Vol. 39, No. 6, 2006, pp. 929–939.
[82] Anderson, R. J., Mayr, E. W., and Warmuth, M. K., “Parallel approximation algorithms for
bin packing,” Information and Computation, Vol. 82, No. 3, 1989, pp. 262–277.
[83] Leinberger, W., Karypis, G., and Kumar, V., “Multi-capacity bin packing algorithms with
applications to job scheduling under multiple constraints,” 1999 International Conference on
Parallel Processing, IEEE, 1999, pp. 404–412.
95
