541,684 research outputs found

    SMaSH: A Benchmarking Toolkit for Human Genome Variant Calling

    Full text link
    Motivation: Computational methods are essential to extract actionable information from raw sequencing data, and to thus fulfill the promise of next-generation sequencing technology. Unfortunately, computational tools developed to call variants from human sequencing data disagree on many of their predictions, and current methods to evaluate accuracy and computational performance are ad-hoc and incomplete. Agreement on benchmarking variant calling methods would stimulate development of genomic processing tools and facilitate communication among researchers. Results: We propose SMaSH, a benchmarking methodology for evaluating human genome variant calling algorithms. We generate synthetic datasets, organize and interpret a wide range of existing benchmarking data for real genomes, and propose a set of accuracy and computational performance metrics for evaluating variant calling methods on this benchmarking data. Moreover, we illustrate the utility of SMaSH to evaluate the performance of some leading single nucleotide polymorphism (SNP), indel, and structural variant calling algorithms. Availability: We provide free and open access online to the SMaSH toolkit, along with detailed documentation, at smash.cs.berkeley.edu

    VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

    Get PDF
    Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research

    A Labelled Sequent Calculus for BBI: Proof Theory and Proof Search

    Full text link
    We present a labelled sequent calculus for Boolean BI, a classical variant of O'Hearn and Pym's logic of Bunched Implication. The calculus is simple, sound, complete, and enjoys cut-elimination. We show that all the structural rules in our proof system, including those rules that manipulate labels, can be localised around applications of certain logical rules, thereby localising the handling of these rules in proof search. Based on this, we demonstrate a free variable calculus that deals with the structural rules lazily in a constraint system. A heuristic method to solve the constraints is proposed in the end, with some experimental results

    SVIM: Structural Variant Identification using Mapped Long Reads

    No full text
    Motivation: Structural variants are defined as genomic variants larger than 50bp. They have been shown to affect more bases in any given genome than SNPs or small indels. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results: We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from PacBio and Nanopore sequencing machines. Availability and implementation: The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information: Supplementary data are available at Bioinformatics online

    Undecidability of Multiplicative Subexponential Logic

    Get PDF
    Subexponential logic is a variant of linear logic with a family of exponential connectives--called subexponentials--that are indexed and arranged in a pre-order. Each subexponential has or lacks associated structural properties of weakening and contraction. We show that classical propositional multiplicative linear logic extended with one unrestricted and two incomparable linear subexponentials can encode the halting problem for two register Minsky machines, and is hence undecidable.Comment: In Proceedings LINEARITY 2014, arXiv:1502.0441

    On the strictness of the quantifier structure hierarchy in first-order logic

    Full text link
    We study a natural hierarchy in first-order logic, namely the quantifier structure hierarchy, which gives a systematic classification of first-order formulas based on structural quantifier resource. We define a variant of Ehrenfeucht-Fraisse games that characterizes quantifier classes and use it to prove that this hierarchy is strict over finite structures, using strategy compositions. Moreover, we prove that this hierarchy is strict even over ordered finite structures, which is interesting in the context of descriptive complexity.Comment: 38 pages, 8 figure

    NLC-2 graph recognition and isomorphism

    Get PDF
    NLC-width is a variant of clique-width with many application in graph algorithmic. This paper is devoted to graphs of NLC-width two. After giving new structural properties of the class, we propose a O(n2m)O(n^2 m)-time algorithm, improving Johansson's algorithm \cite{Johansson00}. Moreover, our alogrithm is simple to understand. The above properties and algorithm allow us to propose a robust O(n2m)O(n^2 m)-time isomorphism algorithm for NLC-2 graphs. As far as we know, it is the first polynomial-time algorithm.Comment: soumis \`{a} WG 2007; 12
    corecore