14,232 research outputs found
D-SPACE4Cloud: A Design Tool for Big Data Applications
The last years have seen a steep rise in data generation worldwide, with the
development and widespread adoption of several software projects targeting the
Big Data paradigm. Many companies currently engage in Big Data analytics as
part of their core business activities, nonetheless there are no tools and
techniques to support the design of the underlying hardware configuration
backing such systems. In particular, the focus in this report is set on Cloud
deployed clusters, which represent a cost-effective alternative to on premises
installations. We propose a novel tool implementing a battery of optimization
and prediction techniques integrated so as to efficiently assess several
alternative resource configurations, in order to determine the minimum cost
cluster deployment satisfying QoS constraints. Further, the experimental
campaign conducted on real systems shows the validity and relevance of the
proposed method
How to Price Shared Optimizations in the Cloud
Data-management-as-a-service systems are increasingly being used in
collaborative settings, where multiple users access common datasets. Cloud
providers have the choice to implement various optimizations, such as indexing
or materialized views, to accelerate queries over these datasets. Each
optimization carries a cost and may benefit multiple users. This creates a
major challenge: how to select which optimizations to perform and how to share
their cost among users. The problem is especially challenging when users are
selfish and will only report their true values for different optimizations if
doing so maximizes their utility. In this paper, we present a new approach for
selecting and pricing shared optimizations by using Mechanism Design. We first
show how to apply the Shapley Value Mechanism to the simple case of selecting
and pricing additive optimizations, assuming an offline game where all users
access the service for the same time-period. Second, we extend the approach to
online scenarios where users come and go. Finally, we consider the case of
substitutive optimizations. We show analytically that our mechanisms induce
truth- fulness and recover the optimization costs. We also show experimentally
that our mechanisms yield higher utility than the state-of-the-art approach
based on regret accumulation.Comment: VLDB201
Reducing Electricity Demand Charge for Data Centers with Partial Execution
Data centers consume a large amount of energy and incur substantial
electricity cost. In this paper, we study the familiar problem of reducing data
center energy cost with two new perspectives. First, we find, through an
empirical study of contracts from electric utilities powering Google data
centers, that demand charge per kW for the maximum power used is a major
component of the total cost. Second, many services such as Web search tolerate
partial execution of the requests because the response quality is a concave
function of processing time. Data from Microsoft Bing search engine confirms
this observation.
We propose a simple idea of using partial execution to reduce the peak power
demand and energy cost of data centers. We systematically study the problem of
scheduling partial execution with stringent SLAs on response quality. For a
single data center, we derive an optimal algorithm to solve the workload
scheduling problem. In the case of multiple geo-distributed data centers, the
demand of each data center is controlled by the request routing algorithm,
which makes the problem much more involved. We decouple the two aspects, and
develop a distributed optimization algorithm to solve the large-scale request
routing problem. Trace-driven simulations show that partial execution reduces
cost by for one data center, and by for geo-distributed
data centers together with request routing.Comment: 12 page
H2O: An Autonomic, Resource-Aware Distributed Database System
This paper presents the design of an autonomic, resource-aware distributed
database which enables data to be backed up and shared without complex manual
administration. The database, H2O, is designed to make use of unused resources
on workstation machines. Creating and maintaining highly-available, replicated
database systems can be difficult for untrained users, and costly for IT
departments. H2O reduces the need for manual administration by autonomically
replicating data and load-balancing across machines in an enterprise.
Provisioning hardware to run a database system can be unnecessarily costly as
most organizations already possess large quantities of idle resources in
workstation machines. H2O is designed to utilize this unused capacity by using
resource availability information to place data and plan queries over
workstation machines that are already being used for other tasks. This paper
discusses the requirements for such a system and presents the design and
implementation of H2O.Comment: Presented at SICSA PhD Conference 2010 (http://www.sicsaconf.org/
- …