541,684 research outputs found
SMaSH: A Benchmarking Toolkit for Human Genome Variant Calling
Motivation: Computational methods are essential to extract actionable
information from raw sequencing data, and to thus fulfill the promise of
next-generation sequencing technology. Unfortunately, computational tools
developed to call variants from human sequencing data disagree on many of their
predictions, and current methods to evaluate accuracy and computational
performance are ad-hoc and incomplete. Agreement on benchmarking variant
calling methods would stimulate development of genomic processing tools and
facilitate communication among researchers.
Results: We propose SMaSH, a benchmarking methodology for evaluating human
genome variant calling algorithms. We generate synthetic datasets, organize and
interpret a wide range of existing benchmarking data for real genomes, and
propose a set of accuracy and computational performance metrics for evaluating
variant calling methods on this benchmarking data. Moreover, we illustrate the
utility of SMaSH to evaluate the performance of some leading single nucleotide
polymorphism (SNP), indel, and structural variant calling algorithms.
Availability: We provide free and open access online to the SMaSH toolkit,
along with detailed documentation, at smash.cs.berkeley.edu
VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research
Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research
A Labelled Sequent Calculus for BBI: Proof Theory and Proof Search
We present a labelled sequent calculus for Boolean BI, a classical variant of
O'Hearn and Pym's logic of Bunched Implication. The calculus is simple, sound,
complete, and enjoys cut-elimination. We show that all the structural rules in
our proof system, including those rules that manipulate labels, can be
localised around applications of certain logical rules, thereby localising the
handling of these rules in proof search. Based on this, we demonstrate a free
variable calculus that deals with the structural rules lazily in a constraint
system. A heuristic method to solve the constraints is proposed in the end,
with some experimental results
SVIM: Structural Variant Identification using Mapped Long Reads
Motivation: Structural variants are defined as genomic variants larger than 50bp. They have been shown to affect more bases in any given genome than SNPs or small indels. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results: We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from PacBio and Nanopore sequencing machines. Availability and implementation: The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information: Supplementary data are available at Bioinformatics online
Undecidability of Multiplicative Subexponential Logic
Subexponential logic is a variant of linear logic with a family of
exponential connectives--called subexponentials--that are indexed and arranged
in a pre-order. Each subexponential has or lacks associated structural
properties of weakening and contraction. We show that classical propositional
multiplicative linear logic extended with one unrestricted and two incomparable
linear subexponentials can encode the halting problem for two register Minsky
machines, and is hence undecidable.Comment: In Proceedings LINEARITY 2014, arXiv:1502.0441
On the strictness of the quantifier structure hierarchy in first-order logic
We study a natural hierarchy in first-order logic, namely the quantifier
structure hierarchy, which gives a systematic classification of first-order
formulas based on structural quantifier resource. We define a variant of
Ehrenfeucht-Fraisse games that characterizes quantifier classes and use it to
prove that this hierarchy is strict over finite structures, using strategy
compositions. Moreover, we prove that this hierarchy is strict even over
ordered finite structures, which is interesting in the context of descriptive
complexity.Comment: 38 pages, 8 figure
NLC-2 graph recognition and isomorphism
NLC-width is a variant of clique-width with many application in graph
algorithmic. This paper is devoted to graphs of NLC-width two. After giving new
structural properties of the class, we propose a -time algorithm,
improving Johansson's algorithm \cite{Johansson00}. Moreover, our alogrithm is
simple to understand. The above properties and algorithm allow us to propose a
robust -time isomorphism algorithm for NLC-2 graphs. As far as we
know, it is the first polynomial-time algorithm.Comment: soumis \`{a} WG 2007; 12
- …
