42,587 research outputs found
Extending Demand Response to Tenants in Cloud Data Centers via Non-intrusive Workload Flexibility Pricing
Participating in demand response programs is a promising tool for reducing
energy costs in data centers by modulating energy consumption. Towards this
end, data centers can employ a rich set of resource management knobs, such as
workload shifting and dynamic server provisioning. Nonetheless, these knobs may
not be readily available in a cloud data center (CDC) that serves cloud
tenants/users, because workloads in CDCs are managed by tenants themselves who
are typically charged based on a usage-based or flat-rate pricing and often
have no incentive to cooperate with the CDC operator for demand response and
cost saving. Towards breaking such "split incentive" hurdle, a few recent
studies have tried market-based mechanisms, such as dynamic pricing, inside
CDCs. However, such mechanisms often rely on complex designs that are hard to
implement and difficult to cope with by tenants. To address this limitation, we
propose a novel incentive mechanism that is not dynamic, i.e., it keeps pricing
for cloud resources unchanged for a long period. While it charges tenants based
on a Usage-based Pricing (UP) as used by today's major cloud operators, it
rewards tenants proportionally based on the time length that tenants set as
deadlines for completing their workloads. This new mechanism is called
Usage-based Pricing with Monetary Reward (UPMR). We demonstrate the
effectiveness of UPMR both analytically and empirically. We show that UPMR can
reduce the CDC operator's energy cost by 12.9% while increasing its profit by
4.9%, compared to the state-of-the-art approaches used by today's CDC operators
to charge their tenants
DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams
In a data stream management system (DSMS), users register continuous queries,
and receive result updates as data arrive and expire. We focus on applications
with real-time constraints, in which the user must receive each result update
within a given period after the update occurs. To handle fast data, the DSMS is
commonly placed on top of a cloud infrastructure. Because stream properties
such as arrival rates can fluctuate unpredictably, cloud resources must be
dynamically provisioned and scheduled accordingly to ensure real-time response.
It is quite essential, for the existing systems or future developments, to
possess the ability of scheduling resources dynamically according to the
current workload, in order to avoid wasting resources, or failing in delivering
correct results on time. Motivated by this, we propose DRS, a novel dynamic
resource scheduler for cloud-based DSMSs. DRS overcomes three fundamental
challenges: (a) how to model the relationship between the provisioned resources
and query response time (b) where to best place resources; and (c) how to
measure system load with minimal overhead. In particular, DRS includes an
accurate performance model based on the theory of \emph{Jackson open queueing
networks} and is capable of handling \emph{arbitrary} operator topologies,
possibly with loops, splits and joins. Extensive experiments with real data
confirm that DRS achieves real-time response with close to optimal resource
consumption.Comment: This is the our latest version with certain modificatio
Enabling Adaptive Grid Scheduling and Resource Management
Wider adoption of the Grid concept has led to an increasing amount of federated
computational, storage and visualisation resources being available to scientists and
researchers. Distributed and heterogeneous nature of these resources renders most of the
legacy cluster monitoring and management approaches inappropriate, and poses new
challenges in workflow scheduling on such systems. Effective resource utilisation monitoring
and highly granular yet adaptive measurements are prerequisites for a more efficient Grid
scheduler. We present a suite of measurement applications able to monitor per-process
resource utilisation, and a customisable tool for emulating observed utilisation models. We
also outline our future work on a predictive and probabilistic Grid scheduler. The research is
undertaken as part of UK e-Science EPSRC sponsored project SO-GRM (Self-Organising
Grid Resource Management) in cooperation with BT
Scheduling Storms and Streams in the Cloud
Motivated by emerging big streaming data processing paradigms (e.g., Twitter
Storm, Streaming MapReduce), we investigate the problem of scheduling graphs
over a large cluster of servers. Each graph is a job, where nodes represent
compute tasks and edges indicate data-flows between these compute tasks. Jobs
(graphs) arrive randomly over time, and upon completion, leave the system. When
a job arrives, the scheduler needs to partition the graph and distribute it
over the servers to satisfy load balancing and cost considerations.
Specifically, neighboring compute tasks in the graph that are mapped to
different servers incur load on the network; thus a mapping of the jobs among
the servers incurs a cost that is proportional to the number of "broken edges".
We propose a low complexity randomized scheduling algorithm that, without
service preemptions, stabilizes the system with graph arrivals/departures; more
importantly, it allows a smooth trade-off between minimizing average
partitioning cost and average queue lengths. Interestingly, to avoid service
preemptions, our approach does not rely on a Gibbs sampler; instead, we show
that the corresponding limiting invariant measure has an interpretation
stemming from a loss system.Comment: 14 page
Approximation Algorithms for Energy Minimization in Cloud Service Allocation under Reliability Constraints
We consider allocation problems that arise in the context of service
allocation in Clouds. More specifically, we assume on the one part that each
computing resource is associated to a capacity constraint, that can be chosen
using Dynamic Voltage and Frequency Scaling (DVFS) method, and to a probability
of failure. On the other hand, we assume that the service runs as a set of
independent instances of identical Virtual Machines. Moreover, there exists a
Service Level Agreement (SLA) between the Cloud provider and the client that
can be expressed as follows: the client comes with a minimal number of service
instances which must be alive at the end of the day, and the Cloud provider
offers a list of pairs (price,compensation), this compensation being paid by
the Cloud provider if it fails to keep alive the required number of services.
On the Cloud provider side, each pair corresponds actually to a guaranteed
success probability of fulfilling the constraint on the minimal number of
instances. In this context, given a minimal number of instances and a
probability of success, the question for the Cloud provider is to find the
number of necessary resources, their clock frequency and an allocation of the
instances (possibly using replication) onto machines. This solution should
satisfy all types of constraints during a given time period while minimizing
the energy consumption of used resources. We consider two energy consumption
models based on DVFS techniques, where the clock frequency of physical
resources can be changed. For each allocation problem and each energy model, we
prove deterministic approximation ratios on the consumed energy for algorithms
that provide guaranteed probability failures, as well as an efficient
heuristic, whose energy ratio is not guaranteed
Predicting Scheduling Failures in the Cloud
Cloud Computing has emerged as a key technology to deliver and manage
computing, platform, and software services over the Internet. Task scheduling
algorithms play an important role in the efficiency of cloud computing services
as they aim to reduce the turnaround time of tasks and improve resource
utilization. Several task scheduling algorithms have been proposed in the
literature for cloud computing systems, the majority relying on the
computational complexity of tasks and the distribution of resources. However,
several tasks scheduled following these algorithms still fail because of
unforeseen changes in the cloud environments. In this paper, using tasks
execution and resource utilization data extracted from the execution traces of
real world applications at Google, we explore the possibility of predicting the
scheduling outcome of a task using statistical models. If we can successfully
predict tasks failures, we may be able to reduce the execution time of jobs by
rescheduling failed tasks earlier (i.e., before their actual failing time). Our
results show that statistical models can predict task failures with a precision
up to 97.4%, and a recall up to 96.2%. We simulate the potential benefits of
such predictions using the tool kit GloudSim and found that they can improve
the number of finished tasks by up to 40%. We also perform a case study using
the Hadoop framework of Amazon Elastic MapReduce (EMR) and the jobs of a gene
expression correlations analysis study from breast cancer research. We find
that when extending the scheduler of Hadoop with our predictive models, the
percentage of failed jobs can be reduced by up to 45%, with an overhead of less
than 5 minutes
- …