14,690 research outputs found
VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research
Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research
SMaSH: A Benchmarking Toolkit for Human Genome Variant Calling
Motivation: Computational methods are essential to extract actionable
information from raw sequencing data, and to thus fulfill the promise of
next-generation sequencing technology. Unfortunately, computational tools
developed to call variants from human sequencing data disagree on many of their
predictions, and current methods to evaluate accuracy and computational
performance are ad-hoc and incomplete. Agreement on benchmarking variant
calling methods would stimulate development of genomic processing tools and
facilitate communication among researchers.
Results: We propose SMaSH, a benchmarking methodology for evaluating human
genome variant calling algorithms. We generate synthetic datasets, organize and
interpret a wide range of existing benchmarking data for real genomes, and
propose a set of accuracy and computational performance metrics for evaluating
variant calling methods on this benchmarking data. Moreover, we illustrate the
utility of SMaSH to evaluate the performance of some leading single nucleotide
polymorphism (SNP), indel, and structural variant calling algorithms.
Availability: We provide free and open access online to the SMaSH toolkit,
along with detailed documentation, at smash.cs.berkeley.edu
Development and Verification of a Flight Stack for a High-Altitude Glider in Ada/SPARK 2014
SPARK 2014 is a modern programming language and a new state-of-the-art tool
set for development and verification of high-integrity software. In this paper,
we explore the capabilities and limitations of its latest version in the
context of building a flight stack for a high-altitude unmanned glider. Towards
that, we deliberately applied static analysis early and continuously during
implementation, to give verification the possibility to steer the software
design. In this process we have identified several limitations and pitfalls of
software design and verification in SPARK, for which we give workarounds and
protective actions to avoid them. Finally, we give design recommendations that
have proven effective for verification, and summarize our experiences with this
new language
Kronos: a workflow assembler for genome analytics and informatics.
BackgroundThe field of next-generation sequencing informatics has matured to a point where algorithmic advances in sequence alignment and individual feature detection methods have stabilized. Practical and robust implementation of complex analytical workflows (where such tools are structured into "best practices" for automated analysis of next-generation sequencing datasets) still requires significant programming investment and expertise.ResultsWe present Kronos, a software platform for facilitating the development and execution of modular, auditable, and distributable bioinformatics workflows. Kronos obviates the need for explicit coding of workflows by compiling a text configuration file into executable Python applications. Making analysis modules would still require programming. The framework of each workflow includes a run manager to execute the encoded workflows locally (or on a cluster or cloud), parallelize tasks, and log all runtime events. The resulting workflows are highly modular and configurable by construction, facilitating flexible and extensible meta-applications that can be modified easily through configuration file editing. The workflows are fully encoded for ease of distribution and can be instantiated on external systems, a step toward reproducible research and comparative analyses. We introduce a framework for building Kronos components that function as shareable, modular nodes in Kronos workflows.ConclusionsThe Kronos platform provides a standard framework for developers to implement custom tools, reuse existing tools, and contribute to the community at large. Kronos is shipped with both Docker and Amazon Web Services Machine Images. It is free, open source, and available through the Python Package Index and at https://github.com/jtaghiyar/kronos
ISOWN: accurate somatic mutation identification in the absence of normal tissue controls.
BackgroundA key step in cancer genome analysis is the identification of somatic mutations in the tumor. This is typically done by comparing the genome of the tumor to the reference genome sequence derived from a normal tissue taken from the same donor. However, there are a variety of common scenarios in which matched normal tissue is not available for comparison.ResultsIn this work, we describe an algorithm to distinguish somatic single nucleotide variants (SNVs) in next-generation sequencing data from germline polymorphisms in the absence of normal samples using a machine learning approach. Our algorithm was evaluated using a family of supervised learning classifications across six different cancer types and ~1600 samples, including cell lines, fresh frozen tissues, and formalin-fixed paraffin-embedded tissues; we tested our algorithm with both deep targeted and whole-exome sequencing data. Our algorithm correctly classified between 95 and 98% of somatic mutations with F1-measure ranges from 75.9 to 98.6% depending on the tumor type. We have released the algorithm as a software package called ISOWN (Identification of SOmatic mutations Without matching Normal tissues).ConclusionsIn this work, we describe the development, implementation, and validation of ISOWN, an accurate algorithm for predicting somatic mutations in cancer tissues in the absence of matching normal tissues. ISOWN is available as Open Source under Apache License 2.0 from https://github.com/ikalatskaya/ISOWN
- …
