Efficiently Supporting Hierarchy and Data Updates in DNA Storage
We propose a novel and flexible DNA-storage architecture that provides the
notion of hierarchy among the objects tagged with the same primer pair and
enables efficient data updates. In contrast to prior work, in our architecture
a pair of PCR primers of length 20 does not define a single object, but an
independent storage partition, which is internally managed in an independent
way with its own index structure. We make the observation that, while the
number of mutually compatible primer pairs is limited, the internal address
space available to any pair of primers (i.e., partition) is virtually
unlimited. We expose and leverage the flexibility with which this address space
can be managed to provide rich and functional storage semantics, such as
hierarchical data organization and efficient and flexible implementations of
data updates. Furthermore, to leverage the full power of the prefix-based
nature of PCR addressing, we define a methodology for transforming an arbitrary
indexing scheme into a PCR-compatible equivalent. This allows us to run PCR
with primers that can be variably extended to include a desired part of the
index, and thus narrow down the scope of the reaction to retrieve a specific
object (e.g., file or directory) within the partition with high precision. Our
wetlab evaluation demonstrates the practicality of the proposed ideas and shows
140x reduction in sequencing cost retrieval of smaller objects within the
The runtime performance of modern SAT solvers is deeply connected to the phase transition behavior of CNF formulas. While CNF solving has witnessed significant runtime improvement over the past two decades, the same does not hold for several other classes such as the conjunction of cardinality and XOR constraints, denoted as CARD-XOR formulas. The problem of determining satisfiability of CARDXOR formulas is a fundamental problem with wide variety of applications ranging from discrete integration in the field of artificial intelligence to maximum likelihood decoding in coding theory. The runtime behavior of random CARD-XOR formulas is unexplored in prior work. In this paper, we present the first rigorous empirical study to characterize the runtime behavior of 1-CARD-XOR formulas. We show empirical evidence of a surprising phase-transition that follows a non-linear tradeoff between CARD and XOR constraints
Tolerant Testing of High-Dimensional Samplers with Subcube Conditioning
We study the tolerant testing problem for high-dimensional samplers. Given as
input two samplers and over the -dimensional
space , and two parameters , the goal
of tolerant testing is to test whether the distributions generated by
and are -close or
-far. Since exponential lower bounds (in ) are known for the
problem in the standard sampling model, research has focused on models where
one can draw \textit{conditional} samples.
Among these models, \textit{subcube conditioning} (), which
allows conditioning on arbitrary subcubes of the domain, holds the promise of
widespread adoption in practice owing to its ability to capture the natural
behavior of samplers in constrained domains. To translate the promise into
practice, we need to overcome two crucial roadblocks for tests based on
: the prohibitively large number of queries
() and limitation to non-tolerant
testing (i.e., ).
The primary contribution of this work is to overcome the above challenges: we
design a new tolerant testing methodology (i.e., ) that
allows us to significantly improve the upper bound to
Managing reliability skew in DNA storage
Samplers are the backbone of the implementations of any randomized algorithm. Unfortunately, obtaining an efficient algorithm to test the correctness of samplers is very hard to find. Recently, in a series of works, testers like Barbarik, Teq, Flash for testing of some particular kinds of samplers, like CNF-samplers and Horn-samplers, were obtained. However, their techniques have a significant limitation because one can not expect to use their methods to test for other samplers, such as perfect matching samplers or samplers for sampling linear extensions in posets. In this paper, we present a new testing algorithm that works for such samplers and can estimate the distance of a new sampler from a known sampler (say, the uniform sampler). Testing the identity of distributions is the heart of testing the correctness of samplers. This paper's main technical contribution is developing a new distance estimation algorithm for distributions over high-dimensional cubes using the recently proposed subcube conditioning sampling model. Given subcube conditioning access to an unknown distribution P, and a known distribution Q defined over an n-dimensional Boolean hypercube, our algorithm CubeProbeEst estimates the variation distance between P and Q within additive error using subcube conditional samples from P. Following the testing-via-learning paradigm, we also get a tester that distinguishes between the cases when P and Q are close or far in variation distance with high probability using subcube conditional samples. This estimation algorithm CubeProbeEst in the subcube conditioning sampling model helps us to design the first tester for self-reducible samplers. The correctness of the tester is formally proved. Moreover, we implement CubeProbeEst to test the quality of three samplers for sampling linear extensions in posets