Search CORE

14,690 research outputs found

VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

Author: Ahdesmaki Miika
Barrett J. Carl
Chapman Brad
Dougherty Brian
Dry Jonathan R.
Hofmann Oliver
Johnson Justin
Lai Zhongwu
Markovets Aleksandra
McEwen Robert
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/04/2016
Field of study

Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research

Crossref

PubMed Central

Enlighten

University of Melbourne Institutional Repository

SMaSH: A Benchmarking Toolkit for Human Genome Variant Calling

Author: Bresler Ma'ayan
Curtis Kristal
Hartl Christopher
Jordan Michael I.
Liptrap Jesse
Newcomb Julie
Patterson David
Song Yun S.
Talwalkar Ameet
Terhorst Jonathan
Publication venue
Publication date: 05/01/2014
Field of study

Motivation: Computational methods are essential to extract actionable information from raw sequencing data, and to thus fulfill the promise of next-generation sequencing technology. Unfortunately, computational tools developed to call variants from human sequencing data disagree on many of their predictions, and current methods to evaluate accuracy and computational performance are ad-hoc and incomplete. Agreement on benchmarking variant calling methods would stimulate development of genomic processing tools and facilitate communication among researchers. Results: We propose SMaSH, a benchmarking methodology for evaluating human genome variant calling algorithms. We generate synthetic datasets, organize and interpret a wide range of existing benchmarking data for real genomes, and propose a set of accuracy and computational performance metrics for evaluating variant calling methods on this benchmarking data. Moreover, we illustrate the utility of SMaSH to evaluate the performance of some leading single nucleotide polymorphism (SNP), indel, and structural variant calling algorithms. Availability: We provide free and open access online to the SMaSH toolkit, along with detailed documentation, at smash.cs.berkeley.edu

arXiv.org e-Print Archive

Crossref

PubMed Central

eScholarship - University of California

Development and Verification of a Flight Stack for a High-Altitude Glider in Ada/SPARK 2014

Author: A Burns
D Hoang
J Xiang
R Chapman
Publication venue
Publication date: 08/06/2017
Field of study

SPARK 2014 is a modern programming language and a new state-of-the-art tool set for development and verification of high-integrity software. In this paper, we explore the capabilities and limitations of its latest version in the context of building a flight stack for a high-altitude unmanned glider. Towards that, we deliberately applied static analysis early and continuously during implementation, to give verification the possibility to steer the software design. In this process we have identified several limitations and pitfalls of software design and verification in SPARK, for which we give workarounds and protective actions to avoid them. Finally, we give design recommendations that have proven effective for verification, and summarize our experiences with this new language

arXiv.org e-Print Archive

Crossref

Kronos: a workflow assembler for genome analytics and informatics.

Author: Aniba Radhouane
Bashashati Ali
Boutros Paul C
Grande Bruno M
Grewal Diljot
Grewal Jasleen
Morin Ryan D
Rosner Jamie
Shah Sohrab P
Taghiyar M Jafar
Publication venue: eScholarship, University of California
Publication date: 26/06/2017
Field of study

BackgroundThe field of next-generation sequencing informatics has matured to a point where algorithmic advances in sequence alignment and individual feature detection methods have stabilized. Practical and robust implementation of complex analytical workflows (where such tools are structured into "best practices" for automated analysis of next-generation sequencing datasets) still requires significant programming investment and expertise.ResultsWe present Kronos, a software platform for facilitating the development and execution of modular, auditable, and distributable bioinformatics workflows. Kronos obviates the need for explicit coding of workflows by compiling a text configuration file into executable Python applications. Making analysis modules would still require programming. The framework of each workflow includes a run manager to execute the encoded workflows locally (or on a cluster or cloud), parallelize tasks, and log all runtime events. The resulting workflows are highly modular and configurable by construction, facilitating flexible and extensible meta-applications that can be modified easily through configuration file editing. The workflows are fully encoded for ease of distribution and can be instantiated on external systems, a step toward reproducible research and comparative analyses. We introduce a framework for building Kronos components that function as shareable, modular nodes in Kronos workflows.ConclusionsThe Kronos platform provides a standard framework for developers to implement custom tools, reuse existing tools, and contribute to the community at large. Kronos is shipped with both Docker and Amazon Web Services Machine Images. It is free, open source, and available through the Python Package Index and at https://github.com/jtaghiyar/kronos

Crossref

eScholarship - University of California

ISOWN: accurate somatic mutation identification in the absence of normal tissue controls.

Author: Bartlett John MS
Kalatskaya Irina
McPherson John D
Spears Melanie
Stein Lincoln
Trinh Quang M
Publication venue: eScholarship, University of California
Publication date: 01/06/2017
Field of study

BackgroundA key step in cancer genome analysis is the identification of somatic mutations in the tumor. This is typically done by comparing the genome of the tumor to the reference genome sequence derived from a normal tissue taken from the same donor. However, there are a variety of common scenarios in which matched normal tissue is not available for comparison.ResultsIn this work, we describe an algorithm to distinguish somatic single nucleotide variants (SNVs) in next-generation sequencing data from germline polymorphisms in the absence of normal samples using a machine learning approach. Our algorithm was evaluated using a family of supervised learning classifications across six different cancer types and ~1600 samples, including cell lines, fresh frozen tissues, and formalin-fixed paraffin-embedded tissues; we tested our algorithm with both deep targeted and whole-exome sequencing data. Our algorithm correctly classified between 95 and 98% of somatic mutations with F1-measure ranges from 75.9 to 98.6% depending on the tumor type. We have released the algorithm as a software package called ISOWN (Identification of SOmatic mutations Without matching Normal tissues).ConclusionsIn this work, we describe the development, implementation, and validation of ISOWN, an accurate algorithm for predicting somatic mutations in cancer tissues in the absence of matching normal tissues. ISOWN is available as Open Source under Apache License 2.0 from https://github.com/ikalatskaya/ISOWN

TSpace (University of Toronto)

Crossref

Directory of Open Access Journals

eScholarship - University of California