3 research outputs found

    Program Analysis and Compilation Techniques for Speeding up Transactional Database Workloads

    Get PDF
    There is a trend towards increased specialization of data management software for performance reasons. The improved performance not only leads to a more efficient usage of the underlying hardware and cuts the operation costs of the system, but also is a game-changing competitive advantage for many emerging application domains such as high-frequency algorithmic trading, clickstream analysis, infrastructure monitoring, fraud detection, and online advertising to name a few. In this thesis, we study the automatic specialization and optimization of database application programs -- sequences of queries and updates, augmented with control flow constructs as they appear in database scripts, user-defined functions (UDFs), transactional workloads and triggers in languages such as PL/SQL. We propose to build online transaction processing (OLTP) systems around a modern compiler infrastructure. We show how to build an optimizing compiler for transaction programs using generative programming and state-of-the-art compiler technology, and present techniques for aggressive code inlining, fusion, deforestation, and data structure specialization in the domain of relational transaction programs. We also identify and explore the key optimizations that can be applied in this domain. In addition, we study the advantage of using program dependency analysis and restructuring to enable the concurrency control algorithms to achieve higher performance. Traditionally, optimistic concurrency control algorithms, such as optimistic Multi-Version Concurrency Control (MVCC), avoid blocking concurrent transactions at the cost of having a validation phase. Upon failure in the validation phase, the transaction is usually aborted and restarted from scratch. The "abort and restart" approach becomes a performance bottleneck for use cases with high contention objects or long running transactions. In addition, restarting from scratch creates a negative feedback loop in the system, because the system incurs additional overhead that may create even more conflicts. However, using the dependency information inside the transaction programs, we propose a novel transaction repair approach for in-memory databases. This low overhead approach summarizes the transaction programs in the form of a dependency graph. The dependency graph also contains the constructs used in the validation phase of the MVCC algorithm. Then, when encountering conflicts among transactions, our mechanism quickly detects the conflict locations in the program and partially re-executes the conflicting transactions. This approach maximizes the reuse of the computations done in the first execution round and increases the transaction processing throughput. We evaluate the proposed ideas and techniques in the thesis on some popular benchmarks such as TPC-C and modified versions of TPC-H and TPC-E, as well as other micro-benchmarks. We show that applying these techniques leads to 2x-100x performance improvement in many use cases. Besides, by selectively disabling some of the optimizations in the compiler, we derive a clinical and precise way of obtaining insight into their individual performance contributions

    Compilation and Code Optimization for Data Analytics

    Get PDF
    The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance. The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation. As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive. The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code

    Compilation and Code Optimization for Data Analytics

    Get PDF
    The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance. The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation. As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive. The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code
    corecore