Search CORE

13,168 research outputs found

Distributed top-k aggregation queries at large

Author: A. Marian
Gerhard Weikum
H. David
I.F. Ilyas
K. Church
K. Schnaitter
Matthias Bender
N. Bruno
Peter Triantafillou
R. Akbarinia
R. Fagin
Ralf Schenkel
S. Chaudhuri
S. Madden
Sebastian Michel
T. Cormen
Thomas Neumann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Springer - Publisher Connector

Enlighten

MPG.PuRe

Reducing Electricity Demand Charge for Data Centers with Partial Execution

Author: Bash C.
Bertsekas D. P.
Boyd S.
Chen Y.
Gong Z.
Madhyastha H. V.
Zhou R.
Publication venue
Publication date: 17/12/2013
Field of study

Data centers consume a large amount of energy and incur substantial electricity cost. In this paper, we study the familiar problem of reducing data center energy cost with two new perspectives. First, we find, through an empirical study of contracts from electric utilities powering Google data centers, that demand charge per kW for the maximum power used is a major component of the total cost. Second, many services such as Web search tolerate partial execution of the requests because the response quality is a concave function of processing time. Data from Microsoft Bing search engine confirms this observation. We propose a simple idea of using partial execution to reduce the peak power demand and energy cost of data centers. We systematically study the problem of scheduling partial execution with stringent SLAs on response quality. For a single data center, we derive an optimal algorithm to solve the workload scheduling problem. In the case of multiple geo-distributed data centers, the demand of each data center is controlled by the request routing algorithm, which makes the problem much more involved. We decouple the two aspects, and develop a distributed optimization algorithm to solve the large-scale request routing problem. Trace-driven simulations show that partial execution reduces cost by

3\%--10.5\%

for one data center, and by

15.5\%

for geo-distributed data centers together with request routing.Comment: 12 page

arXiv.org e-Print Archive

Crossref

D-SPACE4Cloud: A Design Tool for Big Data Applications

Author: A Aleti
A Castiglione
A Verma
E Vianna
ED Lazowska
F Brosig
HV Jagadish
K Kambatla
KH Lee
M Bertoli
M Malekimajd
M Tribastone
MR Garey
S Becker
W Zhang
Z Zhang
Publication venue
Publication date: 01/01/2016
Field of study

The last years have seen a steep rise in data generation worldwide, with the development and widespread adoption of several software projects targeting the Big Data paradigm. Many companies currently engage in Big Data analytics as part of their core business activities, nonetheless there are no tools and techniques to support the design of the underlying hardware configuration backing such systems. In particular, the focus in this report is set on Cloud deployed clusters, which represent a cost-effective alternative to on premises installations. We propose a novel tool implementing a battery of optimization and prediction techniques integrated so as to efficiently assess several alternative resource configurations, in order to determine the minimum cost cluster deployment satisfying QoS constraints. Further, the experimental campaign conducted on real systems shows the validity and relevance of the proposed method

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref