2,757 research outputs found

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Code Generation for Efficient Query Processing in Managed Runtimes

    Get PDF
    In this paper we examine opportunities arising from the conver-gence of two trends in data management: in-memory database sys-tems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous ‘impedance mismatch ’ problem). Language-integrated query not only gives application developers a more convenient way to query external data sources like IMDBs, but also to use the same querying language to query an application’s in-memory collections. The lat-ter offers further transparency to developers as the query language and all data is represented in the data model of the host program-ming language. However, compared to IMDBs, this additional free-dom comes at a higher cost for query evaluation. Our vision is to improve in-memory query processing of application objects by introducing database technologies to managed runtimes. We focus on querying and we leverage query compilation to im-prove query processing on application objects. We explore dif-ferent query compilation strategies and study how they improve the performance of query processing over application data. We take C] as the host programming language as it supports language-integrated query through the LINQ framework. Our techniques de-liver significant performance improvements over the default LINQ implementation. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing language-integrated query and IMDB-inspired runtimes to provide transparent and highly efficient querying. 1

    Scholars Forum: A New Model For Scholarly Communication

    Get PDF
    Scholarly journals have flourished for over 300 years because they successfully address a broad range of authors' needs: to communicate findings to colleagues, to establish precedence of their work, to gain validation through peer review, to establish their reputation, to know the final version of their work is secure, and to know their work will be accessible by future scholars. Eventually, the development of comprehensive paper and then electronic indexes allowed past work to be readily identified and cited. Just as postal service made it possible to share scholarly work regularly and among a broad readership, the Internet now provides a distribution channel with the power to reduce publication time and to expand traditional print formats by supporting multi-media options and threaded discourse. Despite widespread acceptance of the web by the academic and research community, the incorporation of advanced network technology into a new paradigm for scholarly communication by the publishers of print journals has not materialized. Nor have journal publishers used the lower cost of distribution on the web to make online versions of journals available at lower prices than print versions. It is becoming increasingly clear to the scholarly community that we must envision and develop for ourselves a new, affordable model for disseminating and preserving results, that synthesizes digital technology and the ongoing needs of scholars. In March 1997, with support from the Engineering Information Foundation, Caltech sponsored a Conference on Scholarly Communication to open a dialogue around key issues and to consider the feasibility of alternative undertakings. A general consensus emerged recognizing that the certification of scholarly articles through peer review could be "decoupled" from the rest of the publishing process, and that the peer review process is already supported by the universities whose faculty serve as editors, members of editorial boards, and referees. In the meantime, pressure to enact regressive copyright legislation has added another important element. The ease with which electronic files may be copied and forwarded has encouraged publishers and other owners of copyrighted material to seek means for denying access to anything they own in digital form to all but active subscribers or licensees. Furthermore, should publishers retain the only version of a publication in a digital form, there is a significant risk that this material may eventually be lost through culling little-used or unprofitable back-files, through not investing in conversion expense as technology evolves, through changes in ownership, or through catastrophic physical events. Such a scenario presents an intolerable threat to the future of scholarship

    An Object-Oriented Language-Database Integration Model: The Composition-Filters Approach

    Get PDF
    This paper introduces a new model, based on so-called object-composition filters, that uniformly integrates database-like features into an object-oriented language. The focus is on providing persistent dynamic data structures, data sharing, transactions, multiple views and associative access, integrated with the object-oriented paradigm. The main contribution is that the database-like features are part of this new object-oriented model, and therefore, are uniformly integrated with object-oriented features such as data abstraction, encapsulation, message passing and inheritance. This approach eliminates the problems associated with existing systems such as lack of reusability and extensibility for database operations, the violation of encapsulation, the need to define specific types such as sets, and the incapability to support multiple views. The model is illustrated through the object-oriented language Sina
    corecore