59 research outputs found
Deep Reinforcement Learning for Join Order Enumeration
Join order selection plays a significant role in query performance. However,
modern query optimizers typically employ static join enumeration algorithms
that do not receive any feedback about the quality of the resulting plan.
Hence, optimizers often repeatedly choose the same bad plan, as they do not
have a mechanism for "learning from their mistakes". In this paper, we argue
that existing deep reinforcement learning techniques can be applied to address
this challenge. These techniques, powered by artificial neural networks, can
automatically improve decision making by incorporating feedback from their
successes and failures. Towards this goal, we present ReJOIN, a
proof-of-concept join enumerator, and present preliminary results indicating
that ReJOIN can match or outperform the PostgreSQL optimizer in terms of plan
quality and join enumeration efficiency
Choreo: network-aware task placement for cloud applications
Cloud computing infrastructures are increasingly being used by network-intensive applications that transfer significant amounts of data between the nodes on which they run. This paper shows that tenants can do a better job placing applications by understanding the underlying cloud network as well as the demands of the applications. To do so, tenants must be able to quickly and accurately measure the cloud network and profile their applications, and then use a network-aware placement method to place applications. This paper describes Choreo, a system that solves these problems. Our experiments measure Amazon's EC2 and Rackspace networks and use three weeks of network data from applications running on the HP Cloud network. We find that Choreo reduces application completion time by an average of 8%-14% (max improvement: 61%) when applications are placed all at once, and 22%-43% (max improvement: 79%) when they arrive in real-time, compared to alternative placement schemes.National Science Foundation (U.S.) (Grant 0645960)National Science Foundation (U.S.) (Grant 1065219)National Science Foundation (U.S.) (Grant 1040072
QuickSel: Quick Selectivity Learning with Mixture Models
Estimating the selectivity of a query is a key step in almost any cost-based
query optimizer. Most of today's databases rely on histograms or samples that
are periodically refreshed by re-scanning the data as the underlying data
changes. Since frequent scans are costly, these statistics are often stale and
lead to poor selectivity estimates. As an alternative to scans, query-driven
histograms have been proposed, which refine the histograms based on the actual
selectivities of the observed queries. Unfortunately, these approaches are
either too costly to use in practice---i.e., require an exponential number of
buckets---or quickly lose their advantage as they observe more queries.
In this paper, we propose a selectivity learning framework, called QuickSel,
which falls into the query-driven paradigm but does not use histograms.
Instead, it builds an internal model of the underlying data, which can be
refined significantly faster (e.g., only 1.9 milliseconds for 300 queries).
This fast refinement allows QuickSel to continuously learn from each query and
yield increasingly more accurate selectivity estimates over time. Unlike
query-driven histograms, QuickSel relies on a mixture model and a new
optimization algorithm for training its model. Our extensive experiments on two
real-world datasets confirm that, given the same target accuracy, QuickSel is
34.0x-179.4x faster than state-of-the-art query-driven histograms, including
ISOMER and STHoles. Further, given the same space budget, QuickSel is
26.8%-91.8% more accurate than periodically-updated histograms and samples,
respectively
Occupant productivity and office indoor environment quality : a review of the literature
The purpose of this paper is to review the existing literature to draw an understanding of the relationship between indoor environmental quality and occupant productivity in an office environment. The study reviews over 300 papers from 67 journals, conference articles and books focusing on indoor environment, occupant comfort, productivity and green buildings. It limits its focus to the physical aspects of an office environment. The literature outlines eight Indoor Environmental Quality (IEQ) factors that influence occupant productivity in an office environment. It also discusses different physical parameters under each of the IEQ factors. It proposes a conceptual model of different factors affecting occupant productivity. The study also presents a review of the data collection methods utilised by the research studies that aim to investigate the relationship between IEQ and occupant productivity. The study presents a comprehensive discussion and analysis of different IEQ factors that affect occupant productivity. The paper provides a concise starting point for future researchers interested in the area of indoor environmental quality
Interaction-aware scheduling of report generation workloads
Abstract The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning about query mixes rather than considering queries individually. Current database systems lack the ability to do such reasoning. We propose a new approach based on planning experiments and statistical modeling to capture the impact of query interactions. Our approach requires no prior assumptions about the internal workings of the database system or the nature and cause of query interactions; making it portable across systems. To demonstrate the potential of modeling and exploiting query interactions, we have developed a novel interactionaware query scheduler for report-generation workloads. Our scheduler, called QShuffler, uses two query scheduling algorithms that leverage models of query interactions. The first algorithm is optimized for workloads where queries are submitted in large batches. The second algorithm targets workloads where queries arrive continuously, and scheduling decisions have to be made on-line. We report an experimental evaluation of QShuffler using TPC-H workloads running on IBM DB2. The evaluation shows that QShuffler, by modeling and exploiting query interactions, can consistently out
Database virtualization: A new frontier for database tuning and physical design
Resource virtualization is currently being employed at all levels of the IT infrastructure to improve provisioning and manageability, with the goal of reducing total cost of ownership. This means that database systems will increasingly be run in virtualized environments, inside virtual machines. This has many benefits, but it also introduces new tuning and physical design problems that are of interest to the database research community. In this paper, we discuss how virtualization can benefit database systems, and we present the tuning problems it introduces, which relate to setting the new “tuning knobs ” that control resource allocation to virtual machines in the virtualized environment. We present a formulation of the virtualization design problem, which focuses on setting resource allocation levels for different database workloads statically at deployment and configuration time. An important component of the solution to this problem is modeling the cost of a workload for a given resource allocation. We present an approach to this cost modeling that relies on using the query optimizer in a special virtualization-aware “what-if ” mode. We also discuss the next steps in solving this problem, and present some long-term research directions. 1
- …