Search CORE

8,960 research outputs found

A Simple Data-Adaptive Probabilistic Variant Calling Model

Author: Hoffmann Steve
Stadler Peter F.
Strimmer Korbinian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Several sources of noise obfuscate the identification of single nucleotide variation (SNV) in next generation sequencing data. For instance, errors may be introduced during library construction and sequencing steps. In addition, the reference genome and the algorithms used for the alignment of the reads are further critical factors determining the efficacy of variant calling methods. It is crucial to account for these factors in individual sequencing experiments. Results: We introduce a simple data-adaptive model for variant calling. This model automatically adjusts to specific factors such as alignment errors. To achieve this, several characteristics are sampled from sites with low mismatch rates, and these are used to estimate empirical log-likelihoods. These likelihoods are then combined to a score that typically gives rise to a mixture distribution. From these we determine a decision threshold to separate potentially variant sites from the noisy background. Conclusions: In simulations we show that our simple proposed model is competitive with frequently used much more complex SNV calling algorithms in terms of sensitivity and specificity. It performs specifically well in cases with low allele frequencies. The application to next-generation sequencing data reveals stark differences of the score distributions indicating a strong influence of data specific sources of noise. The proposed model is specifically designed to adjust to these differences.Comment: 19 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Springer - Publisher Connector

Fraunhofer-ePrints

PubMed Central

The University of Manchester - Institutional Repository

A simple data-adaptive probabilistic variant calling model

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

A simple data-adaptive probabilistic variant calling model

Author: A McKenna
B Efron
H Li
H Li
H Li
H Xu
J O’Rawe
KE McElroy
Korbinian Strimmer
MA DePristo
Peter F Stadler
S Hoffmann
S Pabinger
Steve Hoffmann
X Liu
X Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Bayesian Optimization for Probabilistic Programs

Author: Le Tuan Anh
Osborne Michael A.
Rainforth Tom
van de Meent Jan-Willem
Wood Frank
Publication venue
Publication date: 01/01/2016
Field of study

We present the first general purpose framework for marginal maximum a posteriori estimation of probabilistic program variables. By using a series of code transformations, the evidence of any probabilistic program, and therefore of any graphical model, can be optimized with respect to an arbitrary subset of its sampled variables. To carry out this optimization, we develop the first Bayesian optimization package to directly exploit the source code of its target, leading to innovations in problem-independent hyperpriors, unbounded optimization, and implicit constraint satisfaction; delivering significant performance improvements over prominent existing packages. We present applications of our method to a number of tasks including engineering design and parameter optimization

arXiv.org e-Print Archive

Oxford University Research Archive

Advanced Probabilistic Couplings for Differential Privacy

Author: Barthe Gilles
Fong Noémie
Gaboardi Marco
Grégoire Benjamin
Hsu Justin
Strub Pierre-Yves
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/08/2016
Field of study

Differential privacy is a promising formal approach to data privacy, which provides a quantitative bound on the privacy cost of an algorithm that operates on sensitive information. Several tools have been developed for the formal verification of differentially private algorithms, including program logics and type systems. However, these tools do not capture fundamental techniques that have emerged in recent years, and cannot be used for reasoning about cutting-edge differentially private algorithms. Existing techniques fail to handle three broad classes of algorithms: 1) algorithms where privacy depends accuracy guarantees, 2) algorithms that are analyzed with the advanced composition theorem, which shows slower growth in the privacy cost, 3) algorithms that interactively accept adaptive inputs. We address these limitations with a new formalism extending apRHL, a relational program logic that has been used for proving differential privacy of non-interactive algorithms, and incorporating aHL, a (non-relational) program logic for accuracy properties. We illustrate our approach through a single running example, which exemplifies the three classes of algorithms and explores new variants of the Sparse Vector technique, a well-studied algorithm from the privacy literature. We implement our logic in EasyCrypt, and formally verify privacy. We also introduce a novel coupling technique called \emph{optimal subset coupling} that may be of independent interest

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Generic design of Chinese remaindering schemes

Author: Dumas Jean-Guillaume
Gautier Thierry
Roch Jean-Louis
Publication venue
Publication date: 01/01/2010
Field of study

We propose a generic design for Chinese remainder algorithms. A Chinese remainder computation consists in reconstructing an integer value from its residues modulo non coprime integers. We also propose an efficient linear data structure, a radix ladder, for the intermediate storage and computations. Our design is structured into three main modules: a black box residue computation in charge of computing each residue; a Chinese remaindering controller in charge of launching the computation and of the termination decision; an integer builder in charge of the reconstruction computation. We then show that this design enables many different forms of Chinese remaindering (e.g. deterministic, early terminated, distributed, etc.), easy comparisons between these forms and e.g. user-transparent parallelism at different parallel grains

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server