27 research outputs found

    Managing polyglot systems metadata with hypergraphs

    Get PDF
    A single type of data store can hardly fulfill every end-user requirements in the NoSQL world. Therefore, polyglot systems use different types of NoSQL datastores in combination. However, the heterogeneity of the data storage models makes managing the metadata a complex task in such systems, with only a handful of research carried out to address this. In this paper, we propose a hypergraph-based approach for representing the catalog of metadata in a polyglot system. Taking an existing common programming interface to NoSQL systems, we extend and formalize it as hypergraphs for managing metadata. Then, we define design constraints and query transformation rules for three representative data store types. Furthermore, we propose a simple query rewriting algorithm using the catalog itself for these data store types and provide a prototype implementation. Finally, we show the feasibility of our approach on a use case of an existing polyglot system.Peer ReviewedPostprint (author's final draft

    Adaptive Management of Multimodel Data and Heterogeneous Workloads

    Get PDF
    Data management systems are facing a growing demand for a tighter integration of heterogeneous data from different applications and sources for both operational and analytical purposes in real-time. However, the vast diversification of the data management landscape has led to a situation where there is a trade-off between high operational performance and a tight integration of data. The difference between the growth of data volume and the growth of computational power demands a new approach for managing multimodel data and handling heterogeneous workloads. With PolyDBMS we present a novel class of database management systems, bridging the gap between multimodel database and polystore systems. This new kind of database system combines the operational capabilities of traditional database systems with the flexibility of polystore systems. This includes support for data modifications, transactions, and schema changes at runtime. With native support for multiple data models and query languages, a PolyDBMS presents a holistic solution for the management of heterogeneous data. This does not only enable a tight integration of data across different applications, it also allows a more efficient usage of resources. By leveraging and combining highly optimized database systems as storage and execution engines, this novel class of database system takes advantage of decades of database systems research and development. In this thesis, we present the conceptual foundations and models for building a PolyDBMS. This includes a holistic model for maintaining and querying multiple data models in one logical schema that enables cross-model queries. With the PolyAlgebra, we present a solution for representing queries based on one or multiple data models while preserving their semantics. Furthermore, we introduce a concept for the adaptive planning and decomposition of queries across heterogeneous database systems with different capabilities and features. The conceptual contributions presented in this thesis materialize in Polypheny-DB, the first implementation of a PolyDBMS. Supporting the relational, document, and labeled property graph data model, Polypheny-DB is a suitable solution for structured, semi-structured, and unstructured data. This is complemented by an extensive type system that includes support for binary large objects. With support for multiple query languages, industry standard query interfaces, and a rich set of domain-specific data stores and data sources, Polypheny-DB offers a flexibility unmatched by existing data management solutions

    Extending a methodology for migration of the database layer to the cloud considering relational database schema migration to NoSQL

    Get PDF
    The advances in Cloud computing and in modern Web applications have raised the need for highly available and scalable distributed databases to accommodate the big data being created and consumed. Along with the explosion in data growth comes the necessity to rapidly evolve databases and schemas to meet user demands for new functionality. A special attention is being paid to the vast amounts of semi-structured and un-structured data, and the data management tools should reflect the support for these needs. This has lead to the development of new Cloud serving systems such as "Not Only" SQL (NoSQL) databases. NoSQL databases were driven by the scalability needs of the big companies, such as Google, Facebook, Amazon, and Yahoo. While the demands of these key players are different from those of small and medium enterprises in terms of scalability, the core problem is the same - storage arrays are not scalable and force you into expensive, forklift upgrades. These facts combined with changes in how IT resources are delivered and consumed through the Cloud computing paradigm, projects adopting NoSQL solutions are not a hype anymore. NoSQL databases are being offered as a service by the big Cloud providers, such as Google, Amazon, Microsoft, but by smaller vendors as well. In this master thesis we investigate the possibilities and limitations of mapping relational database schemas to NoSQL schemas when migrating the database layer to the Cloud. Based on literature research we provide recommendations and guidelines with regard to schema transformation and discuss the implications at other application architecture layers, such as business logic and data access layer. We extend an existing data migration tool and methodology for incorporating the migration guidelines and hints. Moreover, we validate our work based on a chosen sub-set of relational and NoSQL databases by using example data from the established TPC-H benchmark

    A unified data repository for rich communication services

    Get PDF
    Rich Communication Services (RCS) is a framework that defines a set of IP-based services for the delivery of multimedia communications to mobile network subscribers. The framework unifies a set of pre-existing communication services under a single name, and permits network operators to re-use investments in existing network infrastructure, especially the IP Multimedia Subsystem (IMS), which is a core part of a mobile network and also acts as a docking station for RCS services. RCS generates and utilises disparate subscriber data sets during execution, however, it lacks a harmonised repository for the management of such data sets, thus making it difficult to obtain a unified view of heterogeneous subscriber data. This thesis proposes the creation of a unified data repository for RCS which is based on the User Data Convergence (UDC) standard. The standard was proposed by the 3rd Generation Partnership Project (3GPP), a major telecommunications standardisation group. UDC provides an approach for consolidating subscriber data into a single logical repository without adversely affecting existing network infrastructure, such as the IMS. Thus, this thesis details the design and development of a prototypical implementation of a unified repository, named Converged Subscriber Data Repository (CSDR). It adopts a polyglot persistence model for the underlying data store and exposes heterogeneous data through the Open Data Protocol (OData), which is a candidate implementation of the Ud interface defined in the UDC architecture. With the introduction of polyglot persistence, multiple data stores can be used within the CSDR and disparate network data sources can access heterogeneous data sets using OData as a standard communications protocol. As the CSDR persistence model becomes more complex due to the inclusion of more storage technologies, polyglot persistence ensures a consistent conceptual view of these data sets through OData. Importantly, the CSDR prototype was integrated into a popular open-source implementation of the core part of an IMS network known as the Open IMS Core. The successful integration of the prototype demonstrates its ability to manage and expose a consolidated view of heterogeneous subscriber data, which are generated and used by different RCS services deployed within IMS

    Emergent relational schemas for RDF

    Get PDF
    corecore