71,260 research outputs found
FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory
The advent of Storage Class Memory (SCM) is driving a rethink of storage systems towards a single-level architecture where memory and storage are merged. In this context, several works have investigated how to design persistent trees in SCM as a fundamental building block for these novel systems. However, these trees are significantly slower than DRAM-based counterparts since trees are latency-sensitive and SCM exhibits higher latencies than DRAM. In this paper we propose a novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts. In this novel design, leaf nodes are persisted in SCM while inner nodes are placed in DRAM and rebuilt upon recovery. The FPTree uses Fingerprinting, a technique that limits the expected number of in-leaf probed keys to one. In addition, we propose a hybrid concurrency scheme for the FPTree that is partially based on Hardware Transactional Memory. We conduct a thorough performance evaluation and show that the FPTree outperforms state-of-the-art persistent trees with different SCM latencies by up to a factor of 8.2. Moreover, we show that the FPTree scales very well on a machine with 88 logical cores. Finally, we integrate the evaluated trees in memcached and a prototype database. We show that the FPTree incurs an almost negligible performance overhead over using fully transient data structures, while significantly outperforming other persistent trees
Cache-Oblivious Persistence
Partial persistence is a general transformation that takes a data structure
and allows queries to be executed on any past state of the structure. The
cache-oblivious model is the leading model of a modern multi-level memory
hierarchy.We present the first general transformation for making
cache-oblivious model data structures partially persistent
Inference of Ancestral Recombination Graphs through Topological Data Analysis
The recent explosion of genomic data has underscored the need for
interpretable and comprehensive analyses that can capture complex phylogenetic
relationships within and across species. Recombination, reassortment and
horizontal gene transfer constitute examples of pervasive biological phenomena
that cannot be captured by tree-like representations. Starting from hundreds of
genomes, we are interested in the reconstruction of potential evolutionary
histories leading to the observed data. Ancestral recombination graphs
represent potential histories that explicitly accommodate recombination and
mutation events across orthologous genomes. However, they are computationally
costly to reconstruct, usually being infeasible for more than few tens of
genomes. Recently, Topological Data Analysis (TDA) methods have been proposed
as robust and scalable methods that can capture the genetic scale and frequency
of recombination. We build upon previous TDA developments for detecting and
quantifying recombination, and present a novel framework that can be applied to
hundreds of genomes and can be interpreted in terms of minimal histories of
mutation and recombination events, quantifying the scales and identifying the
genomic locations of recombinations. We implement this framework in a software
package, called TARGet, and apply it to several examples, including small
migration between different populations, human recombination, and horizontal
evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and
example files used in the manuscript can be obtained from
https://github.com/RabadanLab/TARGe
Inductive benchmarking for purely functional data structures
Every designer of a new data structure wants to know how well it performs in comparison with others. But finding, coding and testing applications as benchmarks can be tedious and time-consuming. Besides, how a benchmark uses a data structure may considerably affect its apparent efficiency, so the choice of applications may bias the results. We address these problems by developing a tool for inductive benchmarking. This tool, Auburn, can generate benchmarks across a wide distribution of uses. We precisely define 'the use of a data structure', upon which we build the core algorithms of Auburn: how to generate a benchmark from a description of use, and how to extract a description of use from an application. We then apply inductive classification techniques to obtain decision trees for the choice between competing data structures. We test Auburn by benchmarking several implementations of three common data structures: queues, random-access lists and heaps. These and other results show Auburn to be a useful and accurate tool, but they also reveal some limitations of the approach
- …