158 research outputs found

    Evaluating the performance of tools used to call minority variants from whole genome short-read data.

    Get PDF
    Background: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person transmission pathways. Several minority variant callers have been developed to describe low frequency sub-populations from whole genome sequence data. These callers differ based on bioinformatics and statistical methods used to discriminate sequencing errors from low-frequency variants. Methods: We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data from virus samples. We used the ART-Illumina read simulation tool to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers' agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance. Results: The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified the majority of variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller. Conclusions: We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impacts on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful in filtering errors when calling low frequency variants

    NGS-pipe: a flexible, easily extendable, and highly configurable framework for NGS analysis

    Get PDF
    Next-generation sequencing is now an established method in genomics, and massive amounts of sequencing data are being generated on a regular basis. Analysis of the sequencing data is typically performed by lab-specific in-house solutions, but the agreement of results from different facilities is often small. General standards for quality control, reproducibility, and documentation are missing.; We developed NGS-pipe, a flexible, transparent, and easy-to-use framework for the design of pipelines to analyze whole-exome, whole-genome, and transcriptome sequencing data. NGS-pipe facilitates the harmonization of genomic data analysis by supporting quality control, documentation, reproducibility, parallelization, and easy adaptation to other NGS experiments. https://github.com/cbg-ethz/NGS-pipe [email protected]

    Investigating intratumour heterogeneity analysis methods and their application in GBM

    Get PDF
    Glioblastoma (GBM) is an incurable cancer with a median survival of 15 months. Despite debulking surgery, cancer cells are inevitably left behind in the surrounding brain, with a minority able to resist subsequent chemoradiotherapy and eventually form a recurrent tumour. This resistance is likely influenced by the cells’ genotypes, which show high variability (intratumour heterogeneity), as a result of tumour evolution. Characterising changes in the genetic architecture of tumours through therapy, may allow us to understand the effect that different mutations and pathways have on cell survival, and potentially identify novel targets for counteracting resistance in GBM. Such analyses involve detection of mutations from bulk tumour samples, and then delineating them into individual genetically distinct ‘subclones’, through subclonal deconvolution. This is a complex process, with no reliable guidelines for the best pipelines to use. I therefore developed methods to allow simulation and in silico sequencing of genomes from realistically complex, artificial tumour samples, so that I could benchmark such pipelines. This revealed that no tested pipelines, using single bulk samples, showed a high level of accuracy, though mutation calling with Mutect2 and FACETS, followed by subclonal deconvolution with Ccube, showed the best results. I then used alternative approaches with the largest longitudinal GBM dataset investigated to date. I found that evidence of strong subclonal selection is absent in many samples, and not associated with therapy. Nonetheless, this does not negate the possibility of smaller, or less frequent, pockets of altered fitness. Using pathway analysis combined with variants that are informative of tumour progression, I identified processes that may confer increased resistance, or sensitisation to therapy, and which warrant further investigation. Lastly, I apply subclonal deconvolution to investigate mouse-specific evolution in GBM patient-derived orthotopic xenografts and found no clear evidence to suggest these models are unsuitable for investigations relevant to humans

    Bioinformatics and computational tools for next-generation sequencing analysis in clinical genetics

    Get PDF
    Clinical genetics has an important role in the healthcare system to provide a definitive diagnosis for many rare syndromes. It also can have an influence over genetics prevention, disease prognosis and assisting the selection of the best options of care/treatment for patients. Next-generation sequencing (NGS) has transformed clinical genetics making possible to analyze hundreds of genes at an unprecedented speed and at a lower price when comparing to conventional Sanger sequencing. Despite the growing literature concerning NGS in a clinical setting, this review aims to fill the gap that exists among (bio)informaticians, molecular geneticists and clinicians, by presenting a general overview of the NGS technology and workflow. First, we will review the current NGS platforms, focusing on the two main platforms Illumina and Ion Torrent, and discussing the major strong points and weaknesses intrinsic to each platform. Next, the NGS analytical bioinformatic pipelines are dissected, giving some emphasis to the algorithms commonly used to generate process data and to analyze sequence variants. Finally, the main challenges around NGS bioinformatics are placed in perspective for future developments. Even with the huge achievements made in NGS technology and bioinformatics, further improvements in bioinformatic algorithms are still required to deal with complex and genetically heterogeneous disorders

    Analysis pipelines for cancer genome sequencing in mice

    Get PDF
    Mouse models of human cancer have transformed our ability to link genetics, molecular mechanisms and phenotypes. Both reverse and forward genetics in mice are currently gaining momentum through advances in next-generation sequencing (NGS). Methodologies to analyze sequencing data were, however, developed for humans and hence do not account for species-specific differences in genome structures and experimental setups. Here, we describe standardized computational pipelines specifically tailored to the analysis of mouse genomic data. We present novel tools and workflows for the detection of different alteration types, including single-nucleotide variants (SNVs), small insertions and deletions (indels), copy-number variations (CNVs), loss of heterozygosity (LOH) and complex rearrangements, such as in chromothripsis. Workflows have been extensively validated and cross-compared using multiple methodologies. We also give step-by-step guidance on the execution of individual analysis types, provide advice on data interpretation and make the complete code available online. The protocol takes 2?7 d, depending on the desired analyses.D.S. is supported by the European Research Council (Consolidator Grant 648521) and the Deutsche Forschungsgemeinschaft (SA1374/4-2; SFB 1321). I.V. is supported by the European Research Council (Starting Grant INTRAHETEROSEQ) and the Spanish Goverment (SAF2016-76758-R). R.R. is supported by the European Research Council (Consolidator Grants PACA-MET and MSCA-ITN-ETN PRECODE), the Deutsche Forschungsgemeinschaft (DFG RA1629/2-1; SFB1243; SFB1321; SFB1335), the German Cancer Consortium Joint Funding Program, and the Deutsche Krebshilfe (70112480)

    A Novel Approach to the Comparative Genomic Analysis of Canine and Human Cancers

    Get PDF
    abstract: Study of canine cancer’s molecular underpinnings holds great potential for informing veterinary and human oncology. Sporadic canine cancers are highly abundant (~4 million diagnoses/year in the United States) and the dog’s unique genomic architecture due to selective inbreeding, alongside the high similarity between dog and human genomes both confer power for improving understanding of cancer genes. However, characterization of canine cancer genome landscapes has been limited. It is hindered by lack of canine-specific tools and resources. To enable robust and reproducible comparative genomic analysis of canine cancers, I have developed a workflow for somatic and germline variant calling in canine cancer genomic data. I have first adapted a human cancer genomics pipeline to create a semi-automated canine pipeline used to map genomic landscapes of canine melanoma, lung adenocarcinoma, osteosarcoma and lymphoma. This pipeline also forms the backbone of my novel comparative genomics workflow. Practical impediments to comparative genomic analysis of dog and human include challenges identifying similarities in mutation type and function across species. For example, canine genes could have evolved different functions and their human orthologs may perform different functions. Hence, I undertook a systematic statistical evaluation of dog and human cancer genes and assessed functional similarities and differences between orthologs to improve understanding of the roles of these genes in cancer across species. I tested this pipeline canine and human Diffuse Large B-Cell Lymphoma (DLBCL), given that canine DLBCL is the most comprehensively genomically characterized canine cancer. Logistic regression with genes bearing somatic coding mutations in each cancer was used to determine if conservation metrics (sequence identity, network placement, etc.) could explain co-mutation of genes in both species. Using this model, I identified 25 co-mutated and evolutionarily similar genes that may be compelling cross-species cancer genes. For example, PCLO was identified as a co-mutated conserved gene with PCLO having been previously identified as recurrently mutated in human DLBCL, but with an unclear role in oncogenesis. Further investigation of these genes might shed new light on the biology of lymphoma in dogs and human and this approach may more broadly serve to prioritize new genes for comparative cancer biology studies.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201
    corecore