1,033,979 research outputs found
Benchmarking database systems for Genomic Selection implementation
Motivation: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. Results: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix
Adding HL7 version 3 data types to PostgreSQL
The HL7 standard is widely used to exchange medical information
electronically. As a part of the standard, HL7 defines scalar communication
data types like physical quantity, point in time and concept descriptor but
also complex types such as interval types, collection types and probabilistic
types. Typical HL7 applications will store their communications in a database,
resulting in a translation from HL7 concepts and types into database types.
Since the data types were not designed to be implemented in a relational
database server, this transition is cumbersome and fraught with programmer
error. The purpose of this paper is two fold. First we analyze the HL7 version
3 data type definitions and define a number of conditions that must be met, for
the data type to be suitable for implementation in a relational database. As a
result of this analysis we describe a number of possible improvements in the
HL7 specification. Second we describe an implementation in the PostgreSQL
database server and show that the database server can effectively execute
scientific calculations with units of measure, supports a large number of
operations on time points and intervals, and can perform operations that are
akin to a medical terminology server. Experiments on synthetic data show that
the user defined types perform better than an implementation that uses only
standard data types from the database server.Comment: 12 pages, 9 figures, 6 table
Consulting report on the NASA technology utilization network system
The purposes of this consulting effort are: (1) to evaluate the existing management and production procedures and workflow as they each relate to the successful development, utilization, and implementation of the NASA Technology Utilization Network System (TUNS) database; (2) to identify, as requested by the NASA Project Monitor, the strengths, weaknesses, areas of bottlenecking, and previously unaddressed problem areas affecting TUNS; (3) to recommend changes or modifications of existing procedures as necessary in order to effect corrections for the overall benefit of NASA TUNS database production, implementation, and utilization; and (4) to recommend the addition of alternative procedures, routines, and activities that will consolidate and facilitate the production, implementation, and utilization of the NASA TUNS database
Deductive Optimization of Relational Data Storage
Optimizing the physical data storage and retrieval of data are two key
database management problems. In this paper, we propose a language that can
express a wide range of physical database layouts, going well beyond the row-
and column-based methods that are widely used in database management systems.
We use deductive synthesis to turn a high-level relational representation of a
database query into a highly optimized low-level implementation which operates
on a specialized layout of the dataset. We build a compiler for this language
and conduct experiments using a popular database benchmark, which shows that
the performance of these specialized queries is competitive with a
state-of-the-art in memory compiled database system
B-LOG: A branch and bound methodology for the parallel execution of logic programs
We propose a computational methodology -"B-LOG"-, which offers the potential for an effective implementation of Logic Programming in a parallel computer. We also propose a weighting scheme to guide the search process through the graph and we apply the concepts of parallel "branch and bound" algorithms in order to perform a "best-first" search using an information theoretic bound. The concept of "session" is used to speed up the search process in a succession of similar queries. Within a session, we strongly modify the bounds in a local database, while bounds kept in a global database are weakly modified to provide a better initial condition for other sessions. We
also propose an implementation scheme based on a database
machine using "semantic paging", and the "B-LOG processor" based on a scoreboard driven controller
Provably-secure symmetric private information retrieval with quantum cryptography
Private information retrieval (PIR) is a database query protocol that
provides user privacy, in that the user can learn a particular entry of the
database of his interest but his query would be hidden from the data centre.
Symmetric private information retrieval (SPIR) takes PIR further by
additionally offering database privacy, where the user cannot learn any
additional entries of the database. Unconditionally secure SPIR solutions with
multiple databases are known classically, but are unrealistic because they
require long shared secret keys between the parties for secure communication
and shared randomness in the protocol. Here, we propose using quantum key
distribution (QKD) instead for a practical implementation, which can realise
both the secure communication and shared randomness requirements. We prove that
QKD maintains the security of the SPIR protocol and that it is also secure
against any external eavesdropper. We also show how such a classical-quantum
system could be implemented practically, using the example of a two-database
SPIR protocol with keys generated by measurement device-independent QKD.
Through key rate calculations, we show that such an implementation is feasible
at the metropolitan level with current QKD technology.Comment: 19 page
Enterprise-Level Database Implementation
The student will design/already have designed an application requiring a robust database solution. This application may be a website, an app, a game, or a traditional application. The student will then design and implement a normalized database to accommodate the application using best practices, including assuring database normalization and designing and implementing appropriate stored procedures and function
Implementation of the FAA research and development electromagnetic database
The Idaho National Engineering Laboratory (INEL) has been assisting the FAA in developing a database of information about lightning. The FAA Research and Development Electromagnetic Database (FRED) will ultimately contain data from a variety of airborne and ground-based lightning research projects. An outline of the data currently available in FRED is presented. The data sources which the FAA intends to incorporate into FRED are listed. In addition, it describes how the researchers may access and use the FRED menu system
- …