Search CORE

9 research outputs found

Interprocedural query extraction for transparent persistence

Author: Ali Ibrahim
Atkinson M.
Ben Wiedermann
Bernstein P. A.
Cooper E.
Gawecki A.
Knuth D. E.
Matthes F.
Morrison R.
William R. Cook
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Automatic partitioning of database applications

Author: Arden Owen
Cheung Alvin K.
Madden Samuel R.
Myers Andrew C.
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2012
Field of study

Database-backed applications are nearly ubiquitous in our daily lives. Applications that make many small accesses to the database create two challenges for developers: increased latency and wasted resources from numerous network round trips. A well-known technique to improve transactional database application performance is to convert part of the application into stored procedures that are executed on the database server. Unfortunately, this conversion is often difficult. In this paper we describe Pyxis, a system that takes database-backed applications and automatically partitions their code into two pieces, one of which is executed on the application server and the other on the database server. Pyxis profiles the application and server loads, statically analyzes the code's dependencies, and produces a partitioning that minimizes the number of control transfers as well as the amount of data sent during each transfer. Our experiments using TPC-C and TPC-W show that Pyxis is able to generate partitions with up to 3x reduction in latency and 1.7x improvement in throughput when compared to a traditional non-partitioned implementation and has comparable performance to that of a custom stored procedure implementation.National Science Foundation (U.S.). Graduate Research Fellowshi

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

Database Queries in Java

Author: Iu Christopher Ming-Yee
Publication venue: Lausanne, EPFL
Publication date: 28/10/2010
Field of study

In conventional programming languages like Java, the interface for accessing databases is often inelegant. Typically, an entire separate database query language must be embedded inside a conventional programming language for programmers to access the full power and speed of a database. Programmers, though, prefer working entirely from within their conventional programming languages, both for general-purpose computation and for database access. This thesis explores how database operations can be expressed using the existing syntax of conventional programming languages. Programmers are able to write all their code –both general purpose code and database access code– in a single language. To run these database operations efficiently though, algorithms are needed for finding these database operations and optimizing them. This thesis focuses on techniques that can be easily adopted because they do not require changes to existing compilers. Three systems have been developed: Queryll, JReq, and HadoopToSQL. Each system examines the problem from the context of functional-style code, imperative-style code, and MapReduce-style code respectively

Infoscience - École polytechnique fédérale de Lausanne

Impact analysis of database schema changes

Author: Maule A.
Publication venue: UCL (University College London)
Publication date: 01/01/2009
Field of study

When database schemas require change, it is typical to predict the effects of the change, first to gauge if the change is worth the expense, and second, to determine what must be reconciled once the change has taken place. Current techniques to predict the effects of schema changes upon applications that use the database can be expensive and error-prone, making the change process expensive and difficult. Our thesis is that an automated approach for predicting these effects, known as an impact analysis, can create a more informed schema change process, allowing stakeholders to obtain beneficial information, at lower costs than currently used industrial practice. This is an interesting research problem because modern data-access practices make it difficult to create an automated analysis that can identify the dependencies between applications and the database schema. In this dissertation we describe a novel analysis that overcomes these difficulties. We present a novel analysis for extracting potential database queries from a program, called query analysis. This query analysis builds upon related work, satisfying the additional requirements that we identify for impact analysis. The impacts of a schema change can be predicted by analysing the results of query analysis, using a process we call impact calculation. We describe impact calculation in detail, and show how it can be practically and efficiently implemented. Due to the level of accuracy required by our query analysis, the analysis can become expensive, so we describe existing and novel approaches for maintaining an efficient and computational tractable analysis. We describe a practical and efficient prototype implementation of our schema change impact analysis, called SUITE. We describe how SUITE was used to evaluate our thesis, using a historical case study of a large commercial software project. The results of this case study show that our impact analysis is feasible for large commercial software applications, and likely to be useful in real-world software development

CiteSeerX

UCL Discovery

Program Analysis and Compilation Techniques for Speeding up Transactional Database Workloads

Author: Dashti Rahmat Abadi Mohammad
Publication venue: Lausanne, EPFL
Publication date: 09/10/2017
Field of study

There is a trend towards increased specialization of data management software for performance reasons. The improved performance not only leads to a more efficient usage of the underlying hardware and cuts the operation costs of the system, but also is a game-changing competitive advantage for many emerging application domains such as high-frequency algorithmic trading, clickstream analysis, infrastructure monitoring, fraud detection, and online advertising to name a few. In this thesis, we study the automatic specialization and optimization of database application programs -- sequences of queries and updates, augmented with control flow constructs as they appear in database scripts, user-defined functions (UDFs), transactional workloads and triggers in languages such as PL/SQL. We propose to build online transaction processing (OLTP) systems around a modern compiler infrastructure. We show how to build an optimizing compiler for transaction programs using generative programming and state-of-the-art compiler technology, and present techniques for aggressive code inlining, fusion, deforestation, and data structure specialization in the domain of relational transaction programs. We also identify and explore the key optimizations that can be applied in this domain. In addition, we study the advantage of using program dependency analysis and restructuring to enable the concurrency control algorithms to achieve higher performance. Traditionally, optimistic concurrency control algorithms, such as optimistic Multi-Version Concurrency Control (MVCC), avoid blocking concurrent transactions at the cost of having a validation phase. Upon failure in the validation phase, the transaction is usually aborted and restarted from scratch. The "abort and restart" approach becomes a performance bottleneck for use cases with high contention objects or long running transactions. In addition, restarting from scratch creates a negative feedback loop in the system, because the system incurs additional overhead that may create even more conflicts. However, using the dependency information inside the transaction programs, we propose a novel transaction repair approach for in-memory databases. This low overhead approach summarizes the transaction programs in the form of a dependency graph. The dependency graph also contains the constructs used in the validation phase of the MVCC algorithm. Then, when encountering conflicts among transactions, our mechanism quickly detects the conflict locations in the program and partially re-executes the conflicting transactions. This approach maximizes the reuse of the computations done in the first execution round and increases the transaction processing throughput. We evaluate the proposed ideas and techniques in the thesis on some popular benchmarks such as TPC-C and modified versions of TPC-H and TPC-E, as well as other micro-benchmarks. We show that applying these techniques leads to 2x-100x performance improvement in many use cases. Besides, by selectively disabling some of the optimizations in the compiler, we derive a clinical and precise way of obtaining insight into their individual performance contributions

Infoscience - École polytechnique fédérale de Lausanne

Interprocedural query extraction for transparent persistence

Author: Ali Ibrahim
Ben Wiedermann
William R. Cook
Publication venue
Publication date: 01/01/2008
Field of study

Transparent persistence promises to integrate programming languages and databases by allowing procedural programs to access persistent data with the same ease as non-persistent data. Transparent persistence is more likely to be adopted if it leverages the performance and transaction management of relational databases. Since creating good relational queries from procedural programs is hard, most practical systems compromise transparency to achieve performance. In this work we demonstrate the practical feasibility of a technique for extracting relational queries from object-oriented programs. A program analysis derives query structure and conditions across methods that access persistent data. The system combines static analysis and runtime query composition to handle procedures that return persistent values. Our prototype Java compiler implements the analysis, and handles recursion and parameterized queries. We evaluate the effectiveness of the optimization on the 007 and TORPEDO benchmarks, showing that automatic optimizations are in some cases as efficient as hand-tuned code. 1

CiteSeerX

Crossref

Compilation and Code Optimization for Data Analytics

Author: Abdullah Abdul Rahim
Kasim Rizanaliah
Mohamad Basir Muhammad Sufyan Safwan
Ramli Mohd Zulkifli
Selamat Nur Asmiza
Publication venue: Lausanne, EPFL
Publication date: 01/03/2016
Field of study

The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance. The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation. As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive. The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code

Infoscience - École polytechnique fédérale de Lausanne

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Compilation and Code Optimization for Data Analytics

Author: Shaikhha Amir
Publication venue: Lausanne, EPFL
Publication date: 29/08/2018
Field of study

Infoscience - École polytechnique fédérale de Lausanne