111,605 research outputs found

    NOSQL design for analytical workloads: Variability matters

    Get PDF
    Big Data has recently gained popularity and has strongly questioned relational databases as universal storage systems, especially in the presence of analytical workloads. As result, co-relational alternatives, commonly known as NOSQL (Not Only SQL) databases, are extensively used for Big Data. As the primary focus of NOSQL is on performance, NOSQL databases are directly designed at the physical level, and consequently the resulting schema is tailored to the dataset and access patterns of the problem in hand. However, we believe that NOSQL design can also benefit from traditional design approaches. In this paper we present a method to design databases for analytical workloads. Starting from the conceptual model and adopting the classical 3-phase design used for relational databases, we propose a novel design method considering the new features brought by NOSQL and encompassing relational and co-relational design altogether.Peer ReviewedPostprint (author's final draft

    Learning Models over Relational Data using Sparse Tensors and Functional Dependencies

    Full text link
    Integrated solutions for analytics over relational databases are of great practical importance as they avoid the costly repeated loop data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and train the desired model using this tool. These integrated solutions are also a fertile ground of theoretically fundamental and challenging problems at the intersection of relational and statistical data models. This article introduces a unified framework for training and evaluating a class of statistical learning models over relational databases. This class includes ridge linear regression, polynomial regression, factorization machines, and principal component analysis. We show that, by synergizing key tools from database theory such as schema information, query structure, functional dependencies, recent advances in query evaluation algorithms, and from linear algebra such as tensor and matrix operations, one can formulate relational analytics problems and design efficient (query and data) structure-aware algorithms to solve them. This theoretical development informed the design and implementation of the AC/DC system for structure-aware learning. We benchmark the performance of AC/DC against R, MADlib, libFM, and TensorFlow. For typical retail forecasting and advertisement planning applications, AC/DC can learn polynomial regression models and factorization machines with at least the same accuracy as its competitors and up to three orders of magnitude faster than its competitors whenever they do not run out of memory, exceed 24-hour timeout, or encounter internal design limitations.Comment: 61 pages, 9 figures, 2 table

    Temporal and Contextual Dependencies in Relational Data Modeling

    Get PDF
    Although a solid theoretical foundation of relational data modeling has existed for decades, critical reassessment from temporal requirements’ perspective reveals shortcomings in its integrity constraints. We identify the need for this work by discussing how existing relational databases fail to ensure correctness of data when the data to be stored is time sensitive. The analysis presented in this work becomes particularly important in present times where, because of relational databases’ inadequacy to cater to all the requirements, new forms of database systems such as temporal databases, active databases, real time databases, and NoSQL (non-relational) databases have been introduced. In relational databases, temporal requirements have been dealt with either at application level using scripts or through manual assistance, but no attempts have been made to address them at design level. These requirements are the ones that need changing metadata as the time progresses, which remains unsupported by Relational Database Management System (RDBMS) to date. Starting with shortcomings of data, entity, and referential integrity in relational data modeling, we propose a new form of integrity that works at a more detailed level of granularity. We also present several important concepts including temporal dependency, contextual dependency, and cell level integrity. We then introduce cellular-constraints to implement the proposed integrity and dependencies, and also how they can be incorporated into the relational data model to enable RDBMS to handle temporal requirements in future. Overall, we provide a formal description to address the temporal requirements’ problem in relational data model, and design a framework for solving this problem. We have supplemented our proposition using examples, experiments and results

    Distribution design in object oriented databases : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Science in Information Systems

    Get PDF
    The advanced development of object oriented database systems has attracted much research. However, very few of them contribute to the distribution design of object oriented databases. The main tasks of distribution design are fragmenting the database schema and allocating the fragments to different sites of a network. The aim of fragmentation and allocation is to improve the performance and increase the availability of a database system. Even though much research has been done on distributed databases, the research almost always refers to the relational data model (RDM). Very few efforts provide distribution design techniques for distributed object oriented databases. The aim of this work is to generalise distribution design techniques from relational databases for object oriented databases. First, the characteristics of distributed databases in general and the techniques used for fragmentation and allocation for the RDM are reviewed. Then, fragmentation operations for a rather generic object oriented data model (OODM) are developed. As with the RDM, these operations include horizontal and vertical fragmentation. A third operation named splitting is also introduced for OODM. Finally, normal predicates are introduced for OODM. A heuristic procedure for horizontal fragmenting of OODBs is also presented. The adaption of horizontal fragmentation techniques for relational databases to object oriented databases is the main result of this work

    A comparative analysis of data redundancy and execution time between relational and object oriented schema table

    Get PDF
    The design of database is one of the important parts in building software, because database is the data storage inside the system. There are some techniques that allow the programmer to improve design of the database. One of the most popular techniques being used for database is the relational technique, which content entity relationship diagram and normalization. The relational technique is easy to use and useful for reducing data redundancy because the normalization technique solves the data redundancy by applying normalization normal forms on the schema tables. The second technique is the object oriented technique, which content class diagram and generate schema table. An advantage of object oriented technique is its closeness to programming languages like C++ or C#. This project is starting with applying relational technique and object oriented technique to define which technique uses less data redundancy during design database. Based on experimental results for total data redundancy in HMS case study was 336 for relational technique and 364 for object oriented technique as well as, course database case study was 186 for relational technique and 204 for object oriented technique. Also, this project is focus on query execution time between relational databases and object oriented database by using user friendly window. The experimental result for query execution time in HMS case study was 107.25 milliseconds for RDBMS and 80.5 milliseconds for OODBMS. In course database case study was 46.75 milliseconds for RDBMS and 31.75 milliseconds for OODBMS. However, the comparative analysis in this project is explaining the result of comparison between relational and object oriented techniques specifically with data redundancy and query execution time
    • …
    corecore