22 research outputs found

    A Data Transformation System for Biological Data Sources

    Get PDF
    Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data

    Requesting Heterogeneous Data Sourceswith Array Comprehensions in Hop.js

    Get PDF
    International audienceDuring the past few years the volume of accumulated data has in-creased dramatically. New kinds of data stores have emerged asNoSQL family stores. Many modern applications now collect, analyze, and produce data from several heterogeneous sources. How-ever implementing such applications is still difficult because of lackof appropriate tools and formalisms. We propose a solution to thisproblem in the context of the JavaScript programming language byextending array comprehensions. Our extension allows program-mers to query data from usual stores, such as SQL databases,NoSQL databases, Semantic Web data repositories, Web pages, oreven custom user defined data structures. The extension has beenimplemented in the Hop.js system. It is the subject of this paper

    Querying an Object-Oriented Database Using CPL

    Get PDF
    The Collection Programming Language is based on a complex value model of data and has successfully been used for querying transforming and integrating data from a wide variety of structured data sources - relational, ACeDB, and ASN.1 among others. However, since there is no notion of objects and classes in CPL, it cannot adequately model recursive types or inheritance, and hence cannot be used to query object-oriented databases (OODBs). By adding a reference type and four operations to CPL - dereference, method invocation, identity test and class type cast - it is possible to express a large class of interesting safe queries against OODBs. As an example of how the extended CPL can be used to query an OODB, we will describe how the extended language has been used as a query interface to Shore databases

    Updating Complex Value Databeses

    Get PDF
    Query languages and their optimizations have been a very important issue in the database community. Languages for updating databases, however, have not been studied to the same extent, although they are clearly important since databases must change over time. The structure and expressiveness of updates is largely dependent on the data model. In relational databases, for example, the update language typically allows the user to specify changes to individual fields of a subset of a relation that meets some selection criterion. The syntax is terse, specifying only the pieces of the database that are to be altered. Because of its simplicity, most of the optimizations take place in the internal processing of the update rather than at the language level. In complex value databases, the need for a terse and optimizable update language is much greater, due to the deeply nested structures involved. Starting with a query language for complex value databases called the Collection Programming Language (CPL), we describe an extension called CPL+ which provides a convenient and intuitive specification of updates on complex values. CPL is a functional language, with powerful optimizations achieved through rewrite rules. Additional rewrite rules are derived for CPL+ and a notion of deltafication is introduced to transform complete updates, expressed as conventional CPL expressions, into equivalent update expressions in CPL+. As a result of applying these transformations, the performance of complex updates can increase substantially

    Comprehending Ringads for Phil Wadler, on the occasion of his 60th birthday

    Get PDF
    Abstract. List comprehensions are a widely used programming construct, in languages such as Haskell and Python and in technologies such as Microsoft's Language Integrated Query. They generalize from lists to arbitrary monads, yielding a lightweight idiom of imperative programming in a pure functional language. When the monad has the additional structure of a so-called ringad, corresponding to 'empty' and 'union' operations, then it can be seen as some kind of collection type, and the comprehension notation can also be extended to incorporate aggregations. Ringad comprehensions represent a convenient notation for expressing database queries. The ringad structure alone does not provide a good explanation or an efficient implementation of relational joins; but by allowing heterogeneous comprehensions, involving both bag and indexed table ringads, we show how to accommodate these too

    Fixpoints and Bounded Fixpoints for Complex Objects

    Get PDF
    We investigate a query language for complex-object databases, which is designed to (1) express only tractable queries, and (2) be as expressive over flat relations as first order logic with fixpoints. The language is obtained by extending the nested relational algebra NRA with a bounded fixpoint operator. As in the flat case, all PTime computable queries over ordered databases are expressible in this language. The main result consists in proving that this language is a conservative extension of the first order logic with fixpoints, or of the while-queries (depending on the interpretation of the bounded fixpoint: inflationary or partial). The proof technique uses indexes, to encode complex objects into flat relations, and is strong enough to allow for the encoding of NRA with unbounded fixpoints into flat relations. We also define a logic based language with fixpoints, the nested relational calculus , and prove that its range restricted version is equivalent to NRA with bounded fixpoints

    Functional Collection Programming with Semi-Ring Dictionaries

    Get PDF
    This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using semi-ring dictionaries. Furthermore, thanks to the algebraic structure behind these dictionaries, SDQL unifies a wide range of optimizations commonly used in databases (DB) and linear algebra (LA). As a result, SDQL enables efficient processing of hybrid DB and LA workloads, by putting together optimizations that are otherwise confined to either DB systems or LA frameworks. We show experimentally that a handful of DB and LA workloads can take advantage of the SDQL language and optimizations. Overall, we observe that SDQL achieves competitive performance relative to Typer and Tectorwise, which are state-of-the-art in-memory DB systems for (flat, not nested) relational data, and achieves an average 2x speedup over SciPy for LA workloads. For hybrid workloads involving LA processing, SDQL achieves up to one order of magnitude speedup over Trance, a state-of-the-art nested relational engine for nested biomedical data, and gives an average 40% speedup over LMFAO, a state-of-the-art in-DB machine learning engine for two (flat) relational real-world retail datasets

    A Practical Theory of Language-integrated Query

    Get PDF
    Language-integrated query is receiving renewed attention, in part because of its support through Microsoft’s LINQ framework. We present a practical theory of language-integrated query based on quotation and normalisation of quoted terms. Our technique supports join queries, abstraction over values and predicates, composition of queries, dynamic generation of queries, and queries with nested intermediate data. Higher-order features prove useful even for constructing first-order queries. We prove a theorem characterising when a host query is guaranteed to generate a single SQL query. We present experimental results confirming our technique works, even in situations where Microsoft’s LINQ framework either fails to produce an SQL query or, in one case, produces an avalanche of SQL queries

    BioKleisli:Integrating Biomedical Data and Analysis Packages

    Get PDF
    corecore