1,033,979 research outputs found

    Benchmarking database systems for Genomic Selection implementation

    Get PDF
    Motivation: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. Results: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix

    Adding HL7 version 3 data types to PostgreSQL

    Get PDF
    The HL7 standard is widely used to exchange medical information electronically. As a part of the standard, HL7 defines scalar communication data types like physical quantity, point in time and concept descriptor but also complex types such as interval types, collection types and probabilistic types. Typical HL7 applications will store their communications in a database, resulting in a translation from HL7 concepts and types into database types. Since the data types were not designed to be implemented in a relational database server, this transition is cumbersome and fraught with programmer error. The purpose of this paper is two fold. First we analyze the HL7 version 3 data type definitions and define a number of conditions that must be met, for the data type to be suitable for implementation in a relational database. As a result of this analysis we describe a number of possible improvements in the HL7 specification. Second we describe an implementation in the PostgreSQL database server and show that the database server can effectively execute scientific calculations with units of measure, supports a large number of operations on time points and intervals, and can perform operations that are akin to a medical terminology server. Experiments on synthetic data show that the user defined types perform better than an implementation that uses only standard data types from the database server.Comment: 12 pages, 9 figures, 6 table

    Consulting report on the NASA technology utilization network system

    Get PDF
    The purposes of this consulting effort are: (1) to evaluate the existing management and production procedures and workflow as they each relate to the successful development, utilization, and implementation of the NASA Technology Utilization Network System (TUNS) database; (2) to identify, as requested by the NASA Project Monitor, the strengths, weaknesses, areas of bottlenecking, and previously unaddressed problem areas affecting TUNS; (3) to recommend changes or modifications of existing procedures as necessary in order to effect corrections for the overall benefit of NASA TUNS database production, implementation, and utilization; and (4) to recommend the addition of alternative procedures, routines, and activities that will consolidate and facilitate the production, implementation, and utilization of the NASA TUNS database

    Deductive Optimization of Relational Data Storage

    Full text link
    Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build a compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of these specialized queries is competitive with a state-of-the-art in memory compiled database system

    B-LOG: A branch and bound methodology for the parallel execution of logic programs

    Get PDF
    We propose a computational methodology -"B-LOG"-, which offers the potential for an effective implementation of Logic Programming in a parallel computer. We also propose a weighting scheme to guide the search process through the graph and we apply the concepts of parallel "branch and bound" algorithms in order to perform a "best-first" search using an information theoretic bound. The concept of "session" is used to speed up the search process in a succession of similar queries. Within a session, we strongly modify the bounds in a local database, while bounds kept in a global database are weakly modified to provide a better initial condition for other sessions. We also propose an implementation scheme based on a database machine using "semantic paging", and the "B-LOG processor" based on a scoreboard driven controller

    Provably-secure symmetric private information retrieval with quantum cryptography

    Full text link
    Private information retrieval (PIR) is a database query protocol that provides user privacy, in that the user can learn a particular entry of the database of his interest but his query would be hidden from the data centre. Symmetric private information retrieval (SPIR) takes PIR further by additionally offering database privacy, where the user cannot learn any additional entries of the database. Unconditionally secure SPIR solutions with multiple databases are known classically, but are unrealistic because they require long shared secret keys between the parties for secure communication and shared randomness in the protocol. Here, we propose using quantum key distribution (QKD) instead for a practical implementation, which can realise both the secure communication and shared randomness requirements. We prove that QKD maintains the security of the SPIR protocol and that it is also secure against any external eavesdropper. We also show how such a classical-quantum system could be implemented practically, using the example of a two-database SPIR protocol with keys generated by measurement device-independent QKD. Through key rate calculations, we show that such an implementation is feasible at the metropolitan level with current QKD technology.Comment: 19 page

    Enterprise-Level Database Implementation

    Get PDF
    The student will design/already have designed an application requiring a robust database solution. This application may be a website, an app, a game, or a traditional application. The student will then design and implement a normalized database to accommodate the application using best practices, including assuring database normalization and designing and implementing appropriate stored procedures and function

    Implementation of the FAA research and development electromagnetic database

    Get PDF
    The Idaho National Engineering Laboratory (INEL) has been assisting the FAA in developing a database of information about lightning. The FAA Research and Development Electromagnetic Database (FRED) will ultimately contain data from a variety of airborne and ground-based lightning research projects. An outline of the data currently available in FRED is presented. The data sources which the FAA intends to incorporate into FRED are listed. In addition, it describes how the researchers may access and use the FRED menu system
    corecore