Search CORE

185 research outputs found

Testing the Accuracy of Query Optimizers

Author: Florian M Waas
Mohamed A Soliman
Zhongxian Gu
Publication venue
Publication date: 01/01/2012
Field of study

ABSTRACT The accuracy of a query optimizer is intricately connected with a database system performance and its operational cost: the more accurate the optimizer's cost model, the better the resulting execution plans. Database application programmers and other practitioners have long provided anecdotal evidence that database systems differ widely with respect to the quality of their optimizers, yet, to date no formal method is available to database users to assess or refute such claims. In this paper, we develop a framework to quantify an optimizer's accuracy for a given workload. We make use of the fact that optimizers expose switches or hints that let users influence the plan choice and generate plans other than the default plan. Using these implements, we force the generation of multiple alternative plans for each test case, time the execution of all alternatives and rank the plans by their effective costs. We compare this ranking with the ranking of the estimated cost and compute a score for the accuracy of the optimizer. We present initial results of an anonymized comparisons for several major commercial database systems demonstrating that there are in fact substantial differences between systems. We also suggest ways to incorporate this knowledge into the commercial development process

CiteSeerX

Stubby: A Transformation-based Optimizer for MapReduce Workflows

Author: Babu Shivnath
Herodotou Herodotos
Lim Harold
Publication venue
Publication date: 01/01/2012
Field of study

There is a growing trend of performing analysis on large datasets using workflows composed of MapReduce jobs connected through producer-consumer relationships based on data. This trend has spurred the development of a number of interfaces--ranging from program-based to query-based interfaces--for generating MapReduce workflows. Studies have shown that the gap in performance can be quite large between optimized and unoptimized workflows. However, automatic cost-based optimization of MapReduce workflows remains a challenge due to the multitude of interfaces, large size of the execution plan space, and the frequent unavailability of all types of information needed for optimization. We introduce a comprehensive plan space for MapReduce workflows generated by popular workflow generators. We then propose Stubby, a cost-based optimizer that searches selectively through the subspace of the full plan space that can be enumerated correctly and costed based on the information available in any given setting. Stubby enumerates the plan space based on plan-to-plan transformations and an efficient search algorithm. Stubby is designed to be extensible to new interfaces and new types of optimizations, which is a desirable feature given how rapidly MapReduce systems are evolving. Stubby's efficiency and effectiveness have been evaluated using representative workflows from many domains.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Ktisis

Design-time performance testing

Author: Hopkins Ian Keith
Publication venue: 'University of Saskatchewan Library'
Publication date: 01/01/2009
Field of study

Software designers make decisions between alternate approaches early in the development of a software application and these decisions can be difficult to change later. Designers make these decisions based on estimates of how alternatives affect software qualities. One software quality that can be difficult to predict is performance, that is, the efficient use of resources in the system. It is particularly challenging to estimate the performance of large, interconnected software systems composed of components. With the proliferation of class libraries, middle-ware systems, web services, and third party components, many software projects rely on third party services to meet their requirements. Often choosing between services involves considering both the functionality and performance of the services. To help software developers compare their designs and third-party services, I propose using performance prototypes of alternatives and test suites to estimate performance trade-offs early in the development cycle, a process called Design-Time Performance Testing (DTPT). Providing software designers with performance evidence based on prototypes will allow designers to make informed decisions regarding performance trade-offs. To show how DTPT can help inform real design decisions. In particular: a process for DTPT, a framework implementation written in Java, and experiments to verify and validate the process and implementation. The implemented framework assists when designing, running, and documenting performance test suites, allowing designers to make accurate comparisons between alternate approaches. Performance metrics are captured by instrumenting and running prototypes. This thesis describes the process and framework for gathering software performance estimates at design-time using prototypes and test suites

eCommons@USASK

University of Saskatchewan Research Archive

Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training

Author: Chowdhury Mosharaf
Chung Jae-Won
You Jie
Publication venue
Publication date: 29/09/2022
Field of study

Training deep neural networks (DNNs) is becoming increasingly more resource- and energy-intensive every year. Unfortunately, existing works primarily focus on optimizing DNN training for faster completion, often without considering the impact on energy efficiency. In this paper, we observe that common practices to improve training performance can often lead to inefficient energy usage. More importantly, we demonstrate that there is a tradeoff between energy consumption and performance optimization. To this end, we propose Zeus, an optimization framework to navigate this tradeoff by automatically finding optimal job- and GPU-level configurations for recurring DNN training jobs. Zeus uses an online exploration-exploitation approach in conjunction with just-in-time energy profiling, averting the need for expensive offline measurements, while adapting to data drifts over time. Our evaluation shows that Zeus can improve the energy efficiency of DNN training by 15.3%-75.8% for diverse workloads.Comment: NSDI 2023 | Homepage https://ml.energy/zeu

arXiv.org e-Print Archive

Recommended from our members

Physical Plan Instrumentation in Databases: Mechanisms and Applications

Author: Psallidas Fotis
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Database management systems (DBMSs) are designed with the goal set to compile SQL queries to physical plans that, when executed, provide results to the SQL queries. Building on this functionality, an ever-increasing number of application domains (e.g., provenance management, online query optimization, physical database design, interactive data profiling, monitoring, and interactive data visualization) seek to operate on how queries are executed by the DBMS for a wide variety of purposes ranging from debugging and data explanation to optimization and monitoring. Unfortunately, DBMSs provide little, if any, support to facilitate the development of this class of important application domains. The effect is such that database application developers and database system architects either rewrite the database internals in ad-hoc ways; work around the SQL interface, if possible, with inevitable performance penalties; or even build new databases from scratch only to express and optimize their domain-specific application logic over how queries are executed. To address this problem in a principled manner in this dissertation, we introduce a prototype DBMS, namely, Smoke, that exposes instrumentation mechanisms in the form of a framework to allow external applications to manipulate physical plans. Intuitively, a physical plan is the underlying representation that DBMSs use to encode how a SQL query will be executed, and providing instrumentation mechanisms at this representation level allows applications to express and optimize their logic on how queries are executed. Having such an instrumentation-enabled DBMS in-place, we then consider how to express and optimize applications that rely their logic on how queries are executed. To best demonstrate the expressive and optimization power of instrumentation-enabled DBMSs, we express and optimize applications across several important domains including provenance management, interactive data visualization, interactive data profiling, physical database design, online query optimization, and query discovery. Expressivity-wise, we show that Smoke can express known techniques, introduce novel semantics on known techniques, and introduce new techniques across domains. Performance-wise, we show case-by-case that Smoke is on par with or up-to several orders of magnitudes faster than state-of-the-art imperative and declarative implementations of important applications across domains. As such, we believe our contributions provide evidence and form the basis towards a class of instrumentation-enabled DBMSs with the goal set to express and optimize applications across important domains with core logic over how queries are executed by DBMSs

Columbia University Academic Commons

Recommended from our members

Use of a Fast Information Extraction Method as a Decision Support Tool

Author: Conlon Sumali
Sheikh Mahmudul
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2010
Field of study

Ad-hoc extraction of information from documents can ensure the transparency of decisions made by an organization. Different Information Extraction methods have been applied to extract information from various domains. Most widely known methods use manually annotated training documents that require high development time. The automated training methods are not scalable to large application domains. We have developed a semi-automated knowledge-engineering method for building the knowledge-base with minimal efforts. Because our method reduces manual processing of the training data, the development process is very fast. We have developed a prototype application to extract information from the project-reports of the American Recovery and Reinvestment Act (ARRA) of 2009. The fast development process of our system, its scalability to large application domains, and its high extraction effectiveness will help the transparency of management decisions by extracting and mining relevant information

CSUSB ScholarWorks

A Survey on Automatic Parameter Tuning for Big Data Processing Systems

Author: Chen Yuxing
Herodotou Herodotos
Lu Jiaheng
Publication venue
Publication date: 01/04/2020
Field of study

Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.Peer reviewe

Ktisis

Helsingin yliopiston digitaalinen arkisto

Fast machine translation on parallel and massively parallel hardware

Author: Bogoychev Nikolay Veselinov
Publication venue: The University of Edinburgh
Publication date: 01/07/2019
Field of study

Parallel systems have been widely adopted in the field of machine translation, because the raw computational power they offer is well suited to this computationally intensive task. However programming for parallel hardware is not trivial as it requires redesign of the existing algorithms. In my thesis I design efficient algorithms for machine translation on parallel hardware. I identify memory accesses as the biggest bottleneck to processing speed and propose novel algorithms that minimize them. I present three distinct case studies in which minimizing memory access substantially improves speed: Starting with statistical machine translation, I design a phrase table that makes decoding ten times faster on a multi-threaded CPU. Next, I design a GPU-based n-gram language model that is twice as fast per £ as a highly optimized CPU implementation. Turning to neural machine translation, I design new stochastic gradient descent techniques that make end-to-end training twice as fast. The work in this thesis has been incorporated in two popular machine translation toolkits: Moses and Marian

Edinburgh Research Archive

Evaluation of Sql Performance Tuning Features in Oracle Database Software

Author: Dobies Katarzyna Marta
Publication venue: ePublications at Regis University
Publication date: 23/08/2013
Field of study

Timely access to data is one of the most important requirements of database management systems. Having access to data in acceptable time is crucial for efficient decision making. Tuning inefficient SQL is one of the most important elements of enhancing performance of databases. With growing repositories and complexity of underlying data management systems, maintaining decent levels of performance and tuning has become a complicated task. DBMS providers acknowledge this tendency and developed tools and features that simplify the process. DBAs and developers have to make use of these tools in the attempt to provide their companies with stable and efficient systems. Performance tuning functions differ from platform to platform. Oracle is the main DBMS provider in the world, and this study focuses on the tools provided in all releases of their software. A thorough literature analysis is performed in order to gain understanding of the functionality and assessment of each tool is performed. It also provides insight into factual utilization of tools by gathering responses through the use of an online survey and an analysis of the results

ePublications at Regis University