22 research outputs found

    Polystore mathematics of relational algebra

    Get PDF
    Financial transactions, internet search, and data analysis are all placing increasing demands on databases. SQL, NoSQL, and NewSQL databases have been developed to meet these demands and each offers unique benefits. SQL, NoSQL, and NewSQL databases also rely on different underlying mathematical models. Polystores seek to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Integrating the underlying mathematics of these diverse databases can be an important enabler for polystores as it enables effective reasoning across different databases. Associative arrays provide a common approach for the mathematics of polystores by encompassing the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). Prior work presented the SQL relational model in terms of associative arrays and identified key mathematical properties that are preserved within SQL. This work provides the rigorous mathematical definitions, lemmas, and theorems underlying these properties. Specifically, SQL Relational Algebra deals primarily with relations - multisets of tuples - and operations on and between those relations. These relations can be modeled as associative arrays by treating tuples as non-zero rows in an array. Operations in relational algebra are built as compositions of standard operations on associative arrays which mirror their matrix counterparts. These constructions provide insight into how relational algebra can be handled via array operations. As an example application, the composition of two projection operations is shown to also be a projection, and the projection of a union is shown to be equal to the union of the projections

    Polystore mathematics of relational algebra

    Get PDF
    Financial transactions, internet search, and data analysis are all placing increasing demands on databases. SQL, NoSQL, and NewSQL databases have been developed to meet these demands and each offers unique benefits. SQL, NoSQL, and NewSQL databases also rely on different underlying mathematical models. Polystores seek to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Integrating the underlying mathematics of these diverse databases can be an important enabler for polystores as it enables effective reasoning across different databases. Associative arrays provide a common approach for the mathematics of polystores by encompassing the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). Prior work presented the SQL relational model in terms of associative arrays and identified key mathematical properties that are preserved within SQL. This work provides the rigorous mathematical definitions, lemmas, and theorems underlying these properties. Specifically, SQL Relational Algebra deals primarily with relations - multisets of tuples - and operations on and between those relations. These relations can be modeled as associative arrays by treating tuples as non-zero rows in an array. Operations in relational algebra are built as compositions of standard operations on associative arrays which mirror their matrix counterparts. These constructions provide insight into how relational algebra can be handled via array operations. As an example application, the composition of two projection operations is shown to also be a projection, and the projection of a union is shown to be equal to the union of the projections

    Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M

    Full text link
    The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that are ideal for analyzing many types of network data. D4M relies on associative arrays which combine properties of spreadsheets, databases, matrices, graphs, and networks, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of D4M associative arrays put enormous pressure on the memory hierarchy. This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array. The parameters of hierarchical associative arrays rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical arrays achieve over 40,000 updates per second in a single instance. Scaling to 34,000 instances of hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.Comment: 6 pages; 6 figures; accepted to IEEE High Performance Extreme Computing (HPEC) Conference 2019. arXiv admin note: text overlap with arXiv:1807.05308, arXiv:1902.0084

    A Formal Category Theoretical Framework for Multi-model Data Transformations

    Get PDF
    Data integration and migration processes in polystores and multi-model database management systems highly benefit from data and schema transformations. Rigorous modeling of transformations is a complex problem. The data and schema transformation field is scattered with multiple different transformation frameworks, tools, and mappings. These are usually domain-specific and lack solid theoretical foundations. Our first goal is to define category theoretical foundations for relational, graph, and hierarchical data models and instances. Each data instance is represented as a category theoretical mapping called a functor. We formalize data and schema transformations as Kan lifts utilizing the functorial representation for the instances. A Kan lift is a category theoretical construction consisting of two mappings satisfying the certain universal property. In this work, the two mappings correspond to schema transformation and data transformation.Peer reviewe

    Adaptive Management of Multimodel Data and Heterogeneous Workloads

    Get PDF
    Data management systems are facing a growing demand for a tighter integration of heterogeneous data from different applications and sources for both operational and analytical purposes in real-time. However, the vast diversification of the data management landscape has led to a situation where there is a trade-off between high operational performance and a tight integration of data. The difference between the growth of data volume and the growth of computational power demands a new approach for managing multimodel data and handling heterogeneous workloads. With PolyDBMS we present a novel class of database management systems, bridging the gap between multimodel database and polystore systems. This new kind of database system combines the operational capabilities of traditional database systems with the flexibility of polystore systems. This includes support for data modifications, transactions, and schema changes at runtime. With native support for multiple data models and query languages, a PolyDBMS presents a holistic solution for the management of heterogeneous data. This does not only enable a tight integration of data across different applications, it also allows a more efficient usage of resources. By leveraging and combining highly optimized database systems as storage and execution engines, this novel class of database system takes advantage of decades of database systems research and development. In this thesis, we present the conceptual foundations and models for building a PolyDBMS. This includes a holistic model for maintaining and querying multiple data models in one logical schema that enables cross-model queries. With the PolyAlgebra, we present a solution for representing queries based on one or multiple data models while preserving their semantics. Furthermore, we introduce a concept for the adaptive planning and decomposition of queries across heterogeneous database systems with different capabilities and features. The conceptual contributions presented in this thesis materialize in Polypheny-DB, the first implementation of a PolyDBMS. Supporting the relational, document, and labeled property graph data model, Polypheny-DB is a suitable solution for structured, semi-structured, and unstructured data. This is complemented by an extensive type system that includes support for binary large objects. With support for multiple query languages, industry standard query interfaces, and a rich set of domain-specific data stores and data sources, Polypheny-DB offers a flexibility unmatched by existing data management solutions
    corecore