67 research outputs found
Database Workload Management (Dagstuhl Seminar 12282)
This report documents the program and the outcomes of Dagstuhl Seminar 12282 "Database Workload Management". Dagstuhl Seminar 12282 was designed to
provide a venue where researchers can engage in dialogue with industrial
participants for an in-depth exploration of challenging industrial
workloads, where industrial participants can challenge researchers to
apply the lessons-learned from their large-scale experiments to multiple
real systems, and that would facilitate the release of real workloads that can be used to drive future research, and concrete measures to evaluate and compare workload management techniques in the context of these workloads
Speedup Your Analytics : Automatic Parameter Tuning for Databases and Big Data Systems
Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance. In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems, and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics.Peer reviewe
Toward Self-Healing Multitier Services
Are self-healing database-centric multitier services utopia or just a hard puzzle? We argue for the latter and aim to identify the missing pieces of this puzzle. We advocate robust and scalable learning-based approaches to self-healing that we expect to work well for a large class of multitier services. We identify performance-availability problems (PAPs) as the most relevant target for self-healing, and argue that PAPs are best addressed macroscopically, outside the realm of individual tiers. Finally, we lay out a research agenda for learning-based approaches to self-healing, to enable wider deployment of self-healing multi-tier services
Cumulon: Cloudbased statistical analysis from users perspective.
Abstract Cumulon is a system aimed at simplifying the developmen
Adaptive query processing in the looking glass
A great deal of work on adaptive query processing has been done over the last few years: Adaptive query processing has been used to detect and correct optimizer errors due to incorrect statistics or simplified cost metrics; it has been applied to long-running continuous queries over data streams whose characteristics vary over time; and routing-based adaptive query processing does away with the optimizer altogether. Despite this large body of interrelated work, no unifying comparison of adaptive query processing techniques or systems has been attempted; we tackle this problem. We identify three families of systems (plan-based, CQ-based, and routingbased), and compare them in detail with respect to the most important aspects of adaptive query processing: plan quality, statistics monitoring and re-optimization, plan migration, and scalability. We also suggest two new approaches to adaptive query processing that address some of the shortcomings revealed by our in-depth analysis: (1) Proactive re-optimization, where the optimizer chooses query plans with the expectation of reoptimization; and (2) Plan logging, where optimizer decisions under different conditions are logged over time, enabling plan reuse as well as analysis of relevant statistics and benefits of adaptivity.
- …