5 research outputs found

    A Domain-Specific Conceptual Query System

    Get PDF
    This thesis presents the architecture and implementation of a query system resulted from a domain-specific conceptual data modeling and querying methodology. The query system is built for a high level conceptual query language that supports dynamically user-defined domain-specific functions and application-specific functions. It is DBMS-independent and can be translated to SQL and OQL through a normal form. Currently, it has been implemented in neuroscience domain and can be applied to any other domain

    Towards Conceptual and Logical Modelling of NoSQL Databases

    Get PDF
    NoSQL databases support the ability to handle large volumes of data in the absence of an explicit data schema. On the other hand, schema information is sometimes essential for applications during data retrieval. Consequently, there are approaches to schema construction in, e.g., the JSON DB and graph DB communities. The difference between a conceptual and database schema is often vague in this case. We use functional constructs – typed attributes for a conceptual view of DB that provide a sufficiently structured approach for expressing semantics of document and graph data. Attribute names are natural language expressions. Such typed functional data objects can be manipulated by terms of a typed λ-calculus, providing powerful nonprocedural query features for considered data structures. The calculus is extendible. Logical, arithmetic, and aggregation functions can be included there. Conceptual and database modelling merge in this case

    RSQL - a query language for dynamic data types

    Get PDF
    Database Management Systems (DBMS) are used by software applications, to store, manipulate, and retrieve large sets of data. However, the requirements of current software systems pose various challenges to established DBMS. First, most software systems organize their data by means of objects rather than relations leading to increased maintenance, redundancy, and transformation overhead when persisting objects to relational databases. Second, complex objects are separated into several objects resulting in Object Schizophrenia and hard to persist Distributed State. Last but not least, current software systems have to cope with increased complexity and changes. These challenges have lead to a general paradigm shift in the development of software systems. Unfortunately, classical DBMS will become intractable, if they are not adapted to the new requirements imposed by these software systems. As a result, we propose an extension of DBMS with roles to represent complex objects within a relational database and support the exibility required by current software systems. To achieve this goal, we introduces RSQL, an extension to SQL with the concept of objects playing roles when interacting with other objects. Additionally, we present a formal model for the logical representation of roles in the extended DBMS

    Protein Structure Data Management System

    Get PDF
    With advancement in the development of the new laboratory instruments and experimental techniques, the protein data has an explosive increasing rate. Therefore how to efficiently store, retrieve and modify protein data is becoming a challenging issue that most biological scientists have to face and solve. Traditional data models such as relational database lack of support for complex data types, which is a big issue for protein data application. Hence many scientists switch to the object-oriented databases since object-oriented nature of life science data perfectly matches the architecture of object-oriented databases, but there are still a lot of problems that need to be solved in order to apply OODB methodologies to manage protein data. One major problem is that the general-purpose OODBs do not have any built-in data types for biological research and built-in biological domain-specific functional operations. In this dissertation, we present an application system with built-in data types and built-in biological domain-specific functional operations that extends the Object-Oriented Database (OODB) system by adding domain-specific additional layers Protein-QL, Protein Algebra Architecture and Protein-OODB above OODB to manage protein structure data. This system is composed of three parts: 1) Client API to provide easy usage for different users. 2) Middleware including Protein-QL, Protein Algebra Architecture and Protein-OODB is designed to implement protein domain specific query language and optimize the complex queries, also it capsulates the details of the implementation such that users can easily understand and master Protein-QL. 3) Data Storage is used to store our protein data. This system is for protein domain, but it can be easily extended into other biological domains to build a bio-OODBMS. In this system, protein, primary, secondary, and tertiary structures are defined as internal data types to simplify the queries in Protein-QL such that the domain scientists can easily master the query language and formulate data requests, and EyeDB is used as the underlying OODB to communicate with Protein-OODB. In addition, protein data is usually stored as PDB format and PDB format is old, ambiguous, and inadequate, therefore, PDB data curation will be discussed in detail in the dissertation

    Physical Design for Non-relational Data Systems

    Get PDF
    Decades of research have gone into the optimization of physical designs, query execution, and related tools for relational databases. These techniques and tools make it possible for non-expert users to make effective use of relational database management systems. However, the drive for flexible data models and increased scalability has spawned a new generation of data management systems which largely eschew the relational model. These include systems such as NoSQL databases and distributed analytics frameworks such as Apache Spark which make use of a diverse set of data models. Optimization techniques and tools developed for relational data do not directly apply in this setting. This leaves developers making use of these systems with the need to become intimately familiar with system details to obtain good performance. We present techniques and tools for physical design for non-relational data systems. We explore two settings: NoSQL database systems and distributed analytics frameworks. While NoSQL databases often avoid explicit schema definitions, many choices on how to structure data remain. These choices can have a significant impact on application performance. The data structuring process normally requires expert knowledge of the underlying database. We present the NoSQL Schema Evaluator (NoSE). Given a target workload, NoSE provides an optimized physical design for NoSQL database applications which compares favourably to schemas designed by expert users. To enable existing applications to benefit from conceptual modeling, we also present an algorithm to recover a logical model from a denormalized database instance. Our second setting is distributed analytics frameworks such as Apache Spark. As is the case for NoSQL databases, expert knowledge of Spark is often required to construct efficient data pipelines. In NoSQL systems, a key challenge is how to structure stored data, while in Spark, a key challenge is how to cache intermediate results. We examine a particularly common scenario in Spark which involves performing iterative analysis on an input dataset. We show that jobs written in an intuitive manner using existing Spark APIs can have poor performance. We propose ReSpark, which automates caching decisions for iterative Spark analyses. Like NoSE, ReSpark makes it possible for non-expert users to obtain good performance from a non-relational data system