Search CORE

98 research outputs found

Database Workload Management (Dagstuhl Seminar 12282)

Author: Babu Shivnath
Graefe Goetz
Kuno Harumi Anne
Publication venue: Dagstuhl Reports. Dagstuhl Reports, Volume 2, Issue 7
Publication date: 01/01/2012
Field of study

This report documents the program and the outcomes of Dagstuhl Seminar 12282 "Database Workload Management". Dagstuhl Seminar 12282 was designed to provide a venue where researchers can engage in dialogue with industrial participants for an in-depth exploration of challenging industrial workloads, where industrial participants can challenge researchers to apply the lessons-learned from their large-scale experiments to multiple real systems, and that would facilitate the release of real workloads that can be used to drive future research, and concrete measures to evaluate and compare workload management techniques in the context of these workloads

Dagstuhl Research Online Publication Server

Stubby: A Transformation-based Optimizer for MapReduce Workflows

Author: Babu Shivnath
Herodotou Herodotos
Lim Harold
Publication venue
Publication date: 01/01/2012
Field of study

There is a growing trend of performing analysis on large datasets using workflows composed of MapReduce jobs connected through producer-consumer relationships based on data. This trend has spurred the development of a number of interfaces--ranging from program-based to query-based interfaces--for generating MapReduce workflows. Studies have shown that the gap in performance can be quite large between optimized and unoptimized workflows. However, automatic cost-based optimization of MapReduce workflows remains a challenge due to the multitude of interfaces, large size of the execution plan space, and the frequent unavailability of all types of information needed for optimization. We introduce a comprehensive plan space for MapReduce workflows generated by popular workflow generators. We then propose Stubby, a cost-based optimizer that searches selectively through the subspace of the full plan space that can be enumerated correctly and costed based on the information available in any given setting. Stubby enumerates the plan space based on plan-to-plan transformations and an efficient search algorithm. Stubby is designed to be extensible to new interfaces and new types of optimizations, which is a desirable feature given how rapidly MapReduce systems are evolving. Stubby's efficiency and effectiveness have been evaluated using representative workflows from many domains.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Ktisis

Speedup Your Analytics : Automatic Parameter Tuning for Databases and Big Data Systems

Author: Babu Shivnath
Chen Yuxing
Herodotou Herodotos
Lu Jiaheng
Publication venue
Publication date: 01/08/2019
Field of study

Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance. In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems, and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics.Peer reviewe

Ktisis

Helsingin yliopiston digitaalinen arkisto

Toward Self-Healing Multitier Services

Author: Babu Shivnath
Candea George
Cook Brian
Duan Songyun
Publication venue
Publication date: 11/11/2008
Field of study

Are self-healing database-centric multitier services utopia or just a hard puzzle? We argue for the latter and aim to identify the missing pieces of this puzzle. We advocate robust and scalable learning-based approaches to self-healing that we expect to work well for a large class of multitier services. We identify performance-availability problems (PAPs) as the most relevant target for self-healing, and argue that PAPs are best addressed macroscopically, outside the realm of individual tiers. Finally, we lay out a research agenda for learning-based approaches to self-healing, to enable wider deployment of self-healing multi-tier services

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Cumulon: Cloudbased statistical analysis from users perspective.

Author: Botong Huang
Jun Yang
Nicholas W D Jarrett
Sayan Mukherjee
Shivnath Babu
Publication venue
Publication date: 01/01/2014
Field of study

Abstract Cumulon is a system aimed at simplifying the developmen

CiteSeerX

Towards Automatic Optimization of MapReduce Programs

Author: Shivnath Babu
Publication venue
Publication date: 01/01/2010
Field of study

Timely and cost-effective processing of large datasets has become a critical ingredient for the success of many academic, government, and industrial organizations. The combination of MapReduce frameworks and cloud computing is an attractive proposition for these organizations. However, even to run a single program in a MapReduce framework, a number of tuning parameters have to be set by users or system administrators. Users often run into performance problems because they don’t know how to set these parameters, or because they don’t even know that these parameters exist. With MapReduce being a relatively new technology, it is not easy to find qualified administrators. In this position paper, we make a case for techniques to automate the setting of tuning parameters for MapReduce programs. The objective is to provide good out-of-the-box performance for ad hoc MapReduce programs run on large datasets. This feature can go a long way towards improving the productivity of users who lack the skills to optimize programs themselves due to lack of familiarity with MapReduce or with the data being processed

CiteSeerX

Crossref