5,408 research outputs found

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Towards Intelligent Databases

    Get PDF
    This article is a presentation of the objectives and techniques of deductive databases. The deductive approach to databases aims at extending with intensional definitions other database paradigms that describe applications extensionaUy. We first show how constructive specifications can be expressed with deduction rules, and how normative conditions can be defined using integrity constraints. We outline the principles of bottom-up and top-down query answering procedures and present the techniques used for integrity checking. We then argue that it is often desirable to manage with a database system not only database applications, but also specifications of system components. We present such meta-level specifications and discuss their advantages over conventional approaches

    Learning Models over Relational Data using Sparse Tensors and Functional Dependencies

    Full text link
    Integrated solutions for analytics over relational databases are of great practical importance as they avoid the costly repeated loop data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and train the desired model using this tool. These integrated solutions are also a fertile ground of theoretically fundamental and challenging problems at the intersection of relational and statistical data models. This article introduces a unified framework for training and evaluating a class of statistical learning models over relational databases. This class includes ridge linear regression, polynomial regression, factorization machines, and principal component analysis. We show that, by synergizing key tools from database theory such as schema information, query structure, functional dependencies, recent advances in query evaluation algorithms, and from linear algebra such as tensor and matrix operations, one can formulate relational analytics problems and design efficient (query and data) structure-aware algorithms to solve them. This theoretical development informed the design and implementation of the AC/DC system for structure-aware learning. We benchmark the performance of AC/DC against R, MADlib, libFM, and TensorFlow. For typical retail forecasting and advertisement planning applications, AC/DC can learn polynomial regression models and factorization machines with at least the same accuracy as its competitors and up to three orders of magnitude faster than its competitors whenever they do not run out of memory, exceed 24-hour timeout, or encounter internal design limitations.Comment: 61 pages, 9 figures, 2 table

    Information-Based Distance Measures and the Canonical Reflection of View Updates

    Get PDF
    For the problem of reflecting an update on a database view to the main schema, the constant-complement strategies are precisely those which avoid all update anomalies, and so define the gold standard for well-behaved solutions to the problem. However, the families of view updates which are supported under such strategies are limited, so it is sometimes necessary to go beyond them, albeit in a systematic fashion. In this work, an investigation of such extended strategies is initiated for relational schemata. The approach is to characterize the information content of a database instance, and then require that the optimal reflection of a view update to the main schema embody the least possible change of information. The key property is identified to be strong monotonicity of the view, meaning that view insertions may always be reflected as insertions to the main schema, and likewise for deletions. In that context it is shown that for insertions and deletions, an optimal update, entailing the least change of information, exists and is unique up to isomorphism for wide classes of constraints

    On distributed data processing in data grid architecture for a virtual repository

    Get PDF
    The article describes the problem of integration of distributed, heterogeneous and fragmented collections of data with application of the virtual repository and the data grid concept. The technology involves: wrappers enveloping external resources, a virtual network (based on the peer-topeer technology) responsible for integration of data into one global schema and a distributed index for speeding-up data retrieval. Authors present a method for obtaining data from heterogeneously structured external databases and then a procedure of integration the data to one, commonly available, global schema. The core of the described solution is based on the Stack-Based Query Language (SBQL) and virtual updatable SBQL views. The system transport and indexing layer is based on the P2P architecture

    Online Data Modeling Tool to Improve Students\u27 Learning of Conceptual Data Modeling

    Get PDF
    Computer based instruction (CBI) has been used in science, mathematics, and engineering education, and showed positive outcomes. CBI is generally most effective when used as a supplement to traditional teaching. In this research-in-progress paper, the author presents an online data modeling tool as the supplement to data modeling teaching and proposes a research plan to measure students’ perceived effectiveness of the online tool

    Author index

    Get PDF
    corecore