8,960 research outputs found
A Simple Data-Adaptive Probabilistic Variant Calling Model
Background: Several sources of noise obfuscate the identification of single
nucleotide variation (SNV) in next generation sequencing data. For instance,
errors may be introduced during library construction and sequencing steps. In
addition, the reference genome and the algorithms used for the alignment of the
reads are further critical factors determining the efficacy of variant calling
methods. It is crucial to account for these factors in individual sequencing
experiments.
Results: We introduce a simple data-adaptive model for variant calling. This
model automatically adjusts to specific factors such as alignment errors. To
achieve this, several characteristics are sampled from sites with low mismatch
rates, and these are used to estimate empirical log-likelihoods. These
likelihoods are then combined to a score that typically gives rise to a mixture
distribution. From these we determine a decision threshold to separate
potentially variant sites from the noisy background.
Conclusions: In simulations we show that our simple proposed model is
competitive with frequently used much more complex SNV calling algorithms in
terms of sensitivity and specificity. It performs specifically well in cases
with low allele frequencies. The application to next-generation sequencing data
reveals stark differences of the score distributions indicating a strong
influence of data specific sources of noise. The proposed model is specifically
designed to adjust to these differences.Comment: 19 pages, 6 figure
Bayesian Optimization for Probabilistic Programs
We present the first general purpose framework for marginal maximum a
posteriori estimation of probabilistic program variables. By using a series of
code transformations, the evidence of any probabilistic program, and therefore
of any graphical model, can be optimized with respect to an arbitrary subset of
its sampled variables. To carry out this optimization, we develop the first
Bayesian optimization package to directly exploit the source code of its
target, leading to innovations in problem-independent hyperpriors, unbounded
optimization, and implicit constraint satisfaction; delivering significant
performance improvements over prominent existing packages. We present
applications of our method to a number of tasks including engineering design
and parameter optimization
Advanced Probabilistic Couplings for Differential Privacy
Differential privacy is a promising formal approach to data privacy, which
provides a quantitative bound on the privacy cost of an algorithm that operates
on sensitive information. Several tools have been developed for the formal
verification of differentially private algorithms, including program logics and
type systems. However, these tools do not capture fundamental techniques that
have emerged in recent years, and cannot be used for reasoning about
cutting-edge differentially private algorithms. Existing techniques fail to
handle three broad classes of algorithms: 1) algorithms where privacy depends
accuracy guarantees, 2) algorithms that are analyzed with the advanced
composition theorem, which shows slower growth in the privacy cost, 3)
algorithms that interactively accept adaptive inputs.
We address these limitations with a new formalism extending apRHL, a
relational program logic that has been used for proving differential privacy of
non-interactive algorithms, and incorporating aHL, a (non-relational) program
logic for accuracy properties. We illustrate our approach through a single
running example, which exemplifies the three classes of algorithms and explores
new variants of the Sparse Vector technique, a well-studied algorithm from the
privacy literature. We implement our logic in EasyCrypt, and formally verify
privacy. We also introduce a novel coupling technique called \emph{optimal
subset coupling} that may be of independent interest
Generic design of Chinese remaindering schemes
We propose a generic design for Chinese remainder algorithms. A Chinese
remainder computation consists in reconstructing an integer value from its
residues modulo non coprime integers. We also propose an efficient linear data
structure, a radix ladder, for the intermediate storage and computations. Our
design is structured into three main modules: a black box residue computation
in charge of computing each residue; a Chinese remaindering controller in
charge of launching the computation and of the termination decision; an integer
builder in charge of the reconstruction computation. We then show that this
design enables many different forms of Chinese remaindering (e.g.
deterministic, early terminated, distributed, etc.), easy comparisons between
these forms and e.g. user-transparent parallelism at different parallel grains
- …