52 research outputs found

    Making an Embedded DBMS JIT-friendly

    Get PDF
    While database management systems (DBMSs) are highly optimized, interactions across the boundary between the programming language (PL) and the DBMS are costly, even for in-process embedded DBMSs. In this paper, we show that programs that interact with the popular embedded DBMS SQLite can be significantly optimized - by a factor of 3.4 in our benchmarks - by inlining across the PL / DBMS boundary. We achieved this speed-up by replacing parts of SQLite's C interpreter with RPython code and composing the resulting meta-tracing virtual machine (VM) - called SQPyte - with the PyPy VM. SQPyte does not compromise stand-alone SQL performance and is 2.2% faster than SQLite on the widely used TPC-H benchmark suite.Comment: 24 pages, 18 figure

    Building Efficient Query Engines in a High-Level Language

    Get PDF
    Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other. We evaluate our approach with the TPC-H benchmark and show that: (a) With all optimizations enabled, LegoBase significantly outperforms a commercial database and an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines

    Heuristic Optimization of Physical Data Bases: Using a Generic and Abstract Design Model

    Get PDF
    Designing efficient physical data bases is a complex activity, involving the consideration of a large number of factors. Mathematical programming-based optimization models for physical design make many simplifying assumptions; thus, their applicability is limited. In this article, we show that heuristic algorithms can be successfully used in the development of very good, physical data base designs. Two heuristic optimization algorithms are proposed in the contest of a genetic and abstract model for physical design. One algorithm is based on generic principles of heuristic optimization. The other is based on capturing and using problem-specific information in the heuristics. The goodness of the algorithms is demonstrated over a wide range of problems and factor values

    Optimising Sargable Conjunctive Predicate Queries in the Context of Big Data

    Get PDF
    With the continued increase in the volume of data, the volume dimension of big data has become a significant factor in estimating query time. When all other factors are held constant, query time increases as the volume of data increases and vice versa. To enhance query time, several techniques have come out of research efforts in this direction. One of such techniques is factorisation of query predicates. Factorisation has been used as a query optimization technique for the general class of predicates but has been found inapplicable to the subclass of sargable conjunctive equality predicates. Experiments performed exposed a peculiar nature of sargable conjunctive equality predicates based on which insight, the concatenated predicate model was formulated as capable of optimising sargable conjunctive equality predicates. Equations from research results were combined in a way that theorems describing the application and optimality of the concatenated predicate model were derived and proved

    High Level Efficiency in Database Languages

    Get PDF
    The subject of this Ph.D. thesis is the design and implementation of database languages. The thesis consists of five articles:  [1] Joan F. Boyar and Kim S. Larsen. Efficient Rebalancing of Chromatic Search Trees. In O. Nurmi and E. Ukkonen, eds., LNCS 621: Algorithm Theory -- SWAT'92 , pp. 151-164. Springer-Verlag, 1992. [2] Kim S. Larsen. On Aggregation and Computation on Domain Values. PB-414, Computer Science Department, Aarhus University, 1992. [3] Kim S. Larsen. Strategies for Expression Evaluation Using Sort-Merge Algorithms. PB-415, Computer Science Department, Aarhus University, 1992. [4] Kim S. Larsen and Michael I. Schwartzbach. Injectivity of Unary Queries With Computation on Domain Values. Computer Science Department, Aarhus University, 1992. Revised version of PB-311. [5] Kim S. Larsen, Michael I. Schwartzbach and Erik M. Schmidt. A New Formalism for Relational Algebra. IPL , 41(3):163-168, 1992. and this survey paper. In [5], a new query language design is proposed. The expressive power of the language is determined in [2] and all reasonable extensions are considered. In [3, 4], we focus on the optimization issue of avoiding unnecessary sorting of relations. The results in these papers are directly applicable to any algebra-based query language. In addition to the query language part, a database system also has to offer update facilities. The theory of standard tuple based updates is quite well developed in the sequential case. In [1], we discuss a new concurrent implementation of balanced search trees for that purpose.This survey paper describes the results of the papers which form the thesis, and relates these results to each other and to the area in a broader sense than is customary in the introductions of individual papers. The paper is intended to be read in combination with the papers on which it is based

    Dmodel and Dalgebra : a data model and algebra for office documents

    Get PDF
    This dissertation presents a data model (called D_model) and an algebra (called D_ algebra) for office documents. The data model adopts a very natural view of modeling office documents. Documents are grouped into classes; each class is characterized by a frame template , which describes the properties (or attributes) for the class of documents. A frame template is instantiated by providing it with values to form a frame instance which becomes the synopsis of the document of the class associated with the frame template. Different frame instances can be grouped into a folder. Therefore, a folder is a set of frame instances which need not be over the same frame template. The D_model is a dual model which describes documents using two hierarchies: a document type hierarchy which depicts the structural organization of the documents and a folder organization, which represents the user\u27s real-world document filing system. The document type hierarchy exploits structural commonalities between frame templates. Such a hierarchy helps classify various documents. The folder organization mimics the user\u27s real-world document filing system and provides the user with an intuitively clear view of the filing system. This facilitates document retrieval activities. The D_algebra includes a family of operators which together comprise the fundamental query language for the D_model. The algebra provides operators that can be applied to folders which contain frame instances of different types. It has more expressive power than the relational algebra. It extends the classical relational algebra by associating attributes with types, and supporting attribute inheritance. Aggregate operators which can be applied to different frame instances in a folder are also provided. The proposed algebra is used as a sound basis to express the semantics of a high level query language for a document processing system, called TEXPROS. In the model, frame instances can represent incomplete information. Null values of the form value at present unknown are used to denote missing information in some fields of the incomplete frame instances. This dissertation provides a proof-theoretic characterization of the data model and defines the semantics of the null values within the proof-theoretic paradigm

    Historia y evolución de las bases de datos

    Get PDF
    El desarrollo de las bases de datos resulta de mucho interés para la mayoría de las personas. A través de la historia, se ha visto todo el proceso de transformación que las bases de datos han tenido, el cual se muestra desde su procedencia hasta el desarrollo más actualizado de estas. Sin embargo, toda esta información aún está muy dispersa, se puede hallar en libros, artículos de revista, internet, entre otros; lo que hace que las personas interesadas en este tema tengan dificultad para encontrar la información de una forma más sintetizada. Es importante conocer la historia de las bases de datos ya que, a través de estas, las personas y las empresas han podido desarrollar sus actividades con más eficiencia, mostrando resultados de manera ágil y veraz. Por lo anterior en este artículo se describen algunos de los sistemas de administración de bases de datos y algunos mecanismos con los que se empezó la automatización de la información en las empresas, al igual que los sistemas de gestión que han tenido mayor relevancia durante la evolución de las bases de datos. A efectos del objetivo trazado, este estudio tiene como punto de partida una breve recopilación histórica de las bases de datos entre 1793 y el presente; continúa con el auge de las bases de datos, el nacimiento del lenguaje SQL y los grandes avances de las bases de datos; además, se refiere de manera corta al tema de motores de bases de datos, para finalizar con un pequeño aporte a lo que se considera es atinente al futuro de las bases de datos.Tecnólogo en Sistemas de Informaciónpregrad