44,241 research outputs found
Serializable Snapshot Isolation in PostgreSQL
This paper describes our experience implementing PostgreSQL's new
serializable isolation level. It is based on the recently-developed
Serializable Snapshot Isolation (SSI) technique. This is the first
implementation of SSI in a production database release as well as the first in
a database that did not previously have a lock-based serializable isolation
level. We reflect on our experience and describe how we overcame some of the
resulting challenges, including the implementation of a new lock manager, a
technique for ensuring memory usage is bounded, and integration with other
PostgreSQL features. We also introduce an extension to SSI that improves
performance for read-only transactions. We evaluate PostgreSQL's serializable
isolation level using several benchmarks and show that it achieves performance
only slightly below that of snapshot isolation, and significantly outperforms
the traditional two-phase locking approach on read-intensive workloads.Comment: VLDB201
Implications of non-volatile memory as primary storage for database management systems
Traditional Database Management System (DBMS) software relies on hard disks for storing relational data. Hard disks are cheap, persistent, and offer huge storage capacities. However, data retrieval latency for hard disks is extremely high. To hide this latency, DRAM is used as an intermediate storage. DRAM is significantly faster than disk, but deployed in smaller capacities due to cost and power constraints, and without the necessary persistency feature that disks have. Non-Volatile Memory (NVM) is an emerging storage class technology which promises the best of both worlds. It can offer large storage capacities, due to better scaling and cost metrics than DRAM, and is non-volatile (persistent) like hard disks. At the same time, its data retrieval time is much lower than that of hard disks and it is also byte-addressable like DRAM. In this paper, we explore the implications of employing NVM as primary storage for DBMS. In other words, we investigate the modifications necessary to be applied on a traditional relational DBMS to take advantage of NVM features. As a case study, we have modified the storage engine (SE) of PostgreSQL enabling efficient use of NVM hardware. We detail the necessary changes and challenges such modifications entail and evaluate them using a comprehensive emulation platform. Results indicate that our modified SE reduces query execution time by up to 40% and 14.4% when compared to disk and NVM storage, with average reductions of 20.5% and 4.5%, respectively.The research leading to these results has received funding from the European Union’s 7th Framework Programme under grant agreement number 318633, the Ministry of Science and Technology of Spain under contract TIN2015-65316-P, and a HiPEAC collaboration grant awarded to Naveed Ul Mustafa.Peer ReviewedPostprint (author's final draft
SmartUnit: Empirical Evaluations for Automated Unit Testing of Embedded Software in Industry
In this paper, we aim at the automated unit coverage-based testing for
embedded software. To achieve the goal, by analyzing the industrial
requirements and our previous work on automated unit testing tool CAUT, we
rebuild a new tool, SmartUnit, to solve the engineering requirements that take
place in our partner companies. SmartUnit is a dynamic symbolic execution
implementation, which supports statement, branch, boundary value and MC/DC
coverage. SmartUnit has been used to test more than one million lines of code
in real projects. For confidentiality motives, we select three in-house real
projects for the empirical evaluations. We also carry out our evaluations on
two open source database projects, SQLite and PostgreSQL, to test the
scalability of our tool since the scale of the embedded software project is
mostly not large, 5K-50K lines of code on average. From our experimental
results, in general, more than 90% of functions in commercial embedded software
achieve 100% statement, branch, MC/DC coverage, more than 80% of functions in
SQLite achieve 100% MC/DC coverage, and more than 60% of functions in
PostgreSQL achieve 100% MC/DC coverage. Moreover, SmartUnit is able to find the
runtime exceptions at the unit testing level. We also have reported exceptions
like array index out of bounds and divided-by-zero in SQLite. Furthermore, we
analyze the reasons of low coverage in automated unit testing in our setting
and give a survey on the situation of manual unit testing with respect to
automated unit testing in industry.Comment: In Proceedings of 40th International Conference on Software
Engineering: Software Engineering in Practice Track, Gothenburg, Sweden, May
27-June 3, 2018 (ICSE-SEIP '18), 10 page
Benchmarking database systems for Genomic Selection implementation
Motivation: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. Results: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix
Adding HL7 version 3 data types to PostgreSQL
The HL7 standard is widely used to exchange medical information
electronically. As a part of the standard, HL7 defines scalar communication
data types like physical quantity, point in time and concept descriptor but
also complex types such as interval types, collection types and probabilistic
types. Typical HL7 applications will store their communications in a database,
resulting in a translation from HL7 concepts and types into database types.
Since the data types were not designed to be implemented in a relational
database server, this transition is cumbersome and fraught with programmer
error. The purpose of this paper is two fold. First we analyze the HL7 version
3 data type definitions and define a number of conditions that must be met, for
the data type to be suitable for implementation in a relational database. As a
result of this analysis we describe a number of possible improvements in the
HL7 specification. Second we describe an implementation in the PostgreSQL
database server and show that the database server can effectively execute
scientific calculations with units of measure, supports a large number of
operations on time points and intervals, and can perform operations that are
akin to a medical terminology server. Experiments on synthetic data show that
the user defined types perform better than an implementation that uses only
standard data types from the database server.Comment: 12 pages, 9 figures, 6 table
An Injection with Tree Awareness: Adding Staircase Join to PostgreSQL
The syntactic wellformedness constraints of XML (opening and closing tags nest properly) imply that XML processors face the challenge to efficiently handle data that takes the shape of ordered, unranked trees. Although RDBMSs have originally been designed to manage table-shaped data, we propose their use as XML and XPath processors. In our setup, the database system employs a relational XML document encoding, the XPath accelerator [1], which maps information about the XML node hierarchy to a table, thus making it possible to evaluate XPath expressions on SQL hosts.\ud
\ud
Conventional RDBMSs, nevertheless, remain ignorant of many interesting properties of the encoded tree data, and were thus found to make no or poor use of these properties. This is why we devised a new join algorithm, staircase join [2], which incorporates the tree-specific knowledge required for an efficient SQL-based evaluation of XPath expressions. In a sense, this demonstration delivers the promise we have made at VLDB 2003 [2]: a notion of tree awareness can be injected into a conventional disk-based RDBMS kernel in terms of staircase join. The demonstration features a side-by-side comparison of both, an original and a staircase-join enhanced instance of PostgreSQL [4]. The required changes to\ud
PostgreSQL were local, the achieved eect, however, is significant: the demonstration proves that this injection of tree awareness turns PostgreSQL into a high-performance XML processor that closely adheres to the XPath semantics
Neo: A Learned Query Optimizer
Query optimization is one of the most challenging problems in database
systems. Despite the progress made over the past decades, query optimizers
remain extremely complex components that require a great deal of hand-tuning
for specific workloads and datasets. Motivated by this shortcoming and inspired
by recent advances in applying machine learning to data management challenges,
we introduce Neo (Neural Optimizer), a novel learning-based query optimizer
that relies on deep neural networks to generate query executions plans. Neo
bootstraps its query optimization model from existing optimizers and continues
to learn from incoming queries, building upon its successes and learning from
its failures. Furthermore, Neo naturally adapts to underlying data patterns and
is robust to estimation errors. Experimental results demonstrate that Neo, even
when bootstrapped from a simple optimizer like PostgreSQL, can learn a model
that offers similar performance to state-of-the-art commercial optimizers, and
in some cases even surpass them
- …
