148 research outputs found

    Development and Validation of Targeted Next-Generation Sequencing Panels for Detection of Germline Variants in Inherited Diseases.

    Get PDF
    Context.-The number of targeted next-generation sequencing (NGS) panels for genetic diseases offered by clinical laboratories is rapidly increasing. Before an NGS-based test is implemented in a clinical laboratory, appropriate validation studies are needed to determine the performance characteristics of the test. Objective.-To provide examples of assay design and validation of targeted NGS gene panels for the detection of germline variants associated with inherited disorders. Data Sources.-The approaches used by 2 clinical laboratories for the development and validation of targeted NGS gene panels are described. Important design and validation considerations are examined. Conclusions.-Clinical laboratories must validate performance specifications of each test prior to implementation. Test design specifications and validation data are provided, outlining important steps in validation of targeted NGS panels by clinical diagnostic laboratories

    BMC Genomics

    Get PDF
    BackgroundDeep sequencing makes it possible to observe low-frequency viral variants and sub-populations with greater accuracy and sensitivity than ever before. Existing platforms can be used to multiplex a large number of samples; however, analysis of the resulting data is complex and involves separating barcoded samples and various read manipulation processes ending in final assembly. Many assembly tools were designed with larger genomes and higher fidelity polymerases in mind and do not perform well with reads derived from highly variable viral genomes. Reference-based assemblers may leave gaps in viral assemblies while de novo assemblers may struggle to assemble unique genomes.ResultsThe IRMA (iterative refinement meta-assembler) pipeline solves the problem of viral variation by the iterative optimization of read gathering and assembly. As with all reference-based assembly, reads are included in assembly when they match consensus template sets; however, IRMA provides for on-the-fly reference editing, correction, and optional elongation without the need for additional reference selection. This increases both read depth and breadth. IRMA also focuses on quality control, error correction, indel reporting, variant calling and variant phasing. In fact, IRMA\ue2\u20ac\u2122s ability to detect and phase minor variants is one of its most distinguishing features. We have built modules for influenza and ebolavirus. We demonstrate usage and provide calibration data from mixture experiments. Methods for variant calling, phasing, and error estimation/correction have been redesigned to meet the needs of viral genomic sequencing.ConclusionIRMA provides a robust next-generation sequencing assembly solution that is adapted to the needs and characteristics of viral genomes. The software solves issues related to the genetic diversity of viruses while providing customized variant calling, phasing, and quality control. IRMA is freely available for non-commercial use on Linux and Mac OS X and has been parallelized for high-throughput computing.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-3030-6) contains supplementary material, which is available to authorized users.2016-09-05T00:00:00Z27595578PMC501193

    Bioinformatics and computational tools for next-generation sequencing analysis in clinical genetics

    Get PDF
    Clinical genetics has an important role in the healthcare system to provide a definitive diagnosis for many rare syndromes. It also can have an influence over genetics prevention, disease prognosis and assisting the selection of the best options of care/treatment for patients. Next-generation sequencing (NGS) has transformed clinical genetics making possible to analyze hundreds of genes at an unprecedented speed and at a lower price when comparing to conventional Sanger sequencing. Despite the growing literature concerning NGS in a clinical setting, this review aims to fill the gap that exists among (bio)informaticians, molecular geneticists and clinicians, by presenting a general overview of the NGS technology and workflow. First, we will review the current NGS platforms, focusing on the two main platforms Illumina and Ion Torrent, and discussing the major strong points and weaknesses intrinsic to each platform. Next, the NGS analytical bioinformatic pipelines are dissected, giving some emphasis to the algorithms commonly used to generate process data and to analyze sequence variants. Finally, the main challenges around NGS bioinformatics are placed in perspective for future developments. Even with the huge achievements made in NGS technology and bioinformatics, further improvements in bioinformatic algorithms are still required to deal with complex and genetically heterogeneous disorders

    Exploiting the great potential of Sequence Capture data by a new tool, SUPER-CAP

    Get PDF
    The recent development of Sequence Capture methodology represents a powerful strategy for enhancing data generation to assess genetic variation of targeted genomic regions. Here, we present SUPER-CAP, a bioinformatics web tool aimed at handling Sequence Capture data, fine calculating the allele frequency of variations and building genotype-specific sequence of captured genes. The dataset used to develop this in silico strategy consists of 378 loci and related regulative regions in a collection of 44 tomato landraces. About 14,000 high-quality variants were identified. The high depth (>40×) of coverage and adopting the correct filtering criteria allowed identification of about 4,000 rare variants and 10 genes with a different copy number variation. We also show that the tool is capable to reconstruct genotype-specific sequences for each genotype by using the detected variants. This allows evaluating the combined effect of multiple variants in the same protein. The architecture and functionality of SUPER-CAP makes the software appropriate for a broad set of analyses including SNP discovery and mining. Its functionality, together with the capability to process large data sets and efficient detection of sequence variation, makes SUPER-CAP a valuable bioinformatics tool for genomics and breeding purposes

    Bioinformatics Workflows for Genomic Variant Discovery, Interpretation and Prioritization

    Get PDF
    Next-generation sequencing (NGS) techniques allow high-throughput detection of a vast amount of variations in a cost-efficient manner. However, there still are inconsistencies and debates about how to process and analyse this ‘big data’. To accurately extract clinically relevant information from genomics data, choosing appropriate tools, knowing how to best utilize them and interpreting the results correctly is crucial. This chapter reviews state-of-the-art bioinformatics approaches in clinically relevant genomic variant detection. Best practices of reads-to-variant discovery workflows for germline and somatic short genomic variants are presented along with the most commonly utilized tools for each step. Additionally, methods for detecting structural variations are overviewed. Finally, approaches and current guidelines for clinical interpretation of genomic variants are discussed. As emphasized in this chapter, data processing and variant discovery steps are relatively well-understood. The differences in prioritization algorithms on the other hand can be perplexing, thus creating a bottleneck during interpretation. This review aims to shed light on the pros and cons of these differences to help experts give more informed decisions

    SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences

    Full text link
    Computational complexity is a key limitation of genomic analyses. Thus, over the last 30 years, researchers have proposed numerous fast heuristic methods that provide computational relief. Comparing genomic sequences is one of the most fundamental computational steps in most genomic analyses. Due to its high computational complexity, optimized exact and heuristic algorithms are still being developed. We find that these methods are highly sensitive to the underlying data, its quality, and various hyperparameters. Despite their wide use, no in-depth analysis has been performed, potentially falsely discarding genetic sequences from further analysis and unnecessarily inflating computational costs. We provide the first analysis and benchmark of this heterogeneity. We deliver an actionable overview of the 11 most widely used state-of-the-art methods for comparing genomic sequences. We also inform readers about their advantages and downsides using thorough experimental evaluation and different real datasets from all major manufacturers (i.e., Illumina, ONT, and PacBio). SequenceLab is publicly available at https://github.com/CMU-SAFARI/SequenceLab

    Microbial community drivers of PK/NRP gene diversity in selected global soils

    Get PDF
    Background The emergence of antibiotic-resistant pathogens has created an urgent need for novel antimicrobial treatments. Advances in next-generation sequencing have opened new frontiers for discovery programmes for natural products allowing the exploitation of a larger fraction of the microbial community. Polyketide (PK) and non-ribosomal pepetide (NRP) natural products have been reported to be related to compounds with antimicrobial and anticancer activities. We report here a new culture-independent approach to explore bacterial biosynthetic diversity and determine bacterial phyla in the microbial community associated with PK and NRP diversity in selected soils. Results Through amplicon sequencing, we explored the microbial diversity (16S rRNA gene) of 13 soils from Antarctica, Africa, Europe and a Caribbean island and correlated this with the amplicon diversity of the adenylation (A) and ketosynthase (KS) domains within functional genes coding for non-ribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), which are involved in the production of NRP and PK, respectively. Mantel and Procrustes correlation analyses with microbial taxonomic data identified not only the well-studied phyla Actinobacteria and Proteobacteria, but also, interestingly, the less biotechnologically exploited phyla Verrucomicrobia and Bacteroidetes, as potential sources harbouring diverse A and KS domains. Some soils, notably that from Antarctica, provided evidence of endemic diversity, whilst others, such as those from Europe, clustered together. In particular, the majority of the domain reads from Antarctica remained unmatched to known sequences suggesting they could encode enzymes for potentially novel PK and NRP. Conclusions The approach presented here highlights potential sources of metabolic novelty in the environment which will be a useful precursor to metagenomic biosynthetic gene cluster mining for PKs and NRPs which could provide leads for new antimicrobial metabolites

    Identifying genetic markers for a range of phylogenetic utility–From species to family level

    Get PDF
    Resolving the phylogenetic relationships of closely related species using a small set of loci is challenging as sufficient information may not be captured from a limited sample of the genome. Relying on few loci can also be problematic when conflict between gene-trees arises from incomplete lineage sorting and/or ongoing hybridization, problems especially likely in recently diverged lineages. Here, we developed a method using limited genomic resources that allows identification of many low copy candidate loci from across the nuclear and chloroplast genomes, design probes for target capture and sequence the captured loci. To validate our method we present data from Eucalyptus and Melaleuca, two large and phylogenetically problematic genera within the Myrtaceae family. With one annotated genome, one transcriptome and two whole-genome shotgun sequences of one Eucalyptus and four Melaleuca species, respectively, we identified 212 loci representing 263 kbp for targeted sequence capture and sequencing. Of these, 209 were successfully tested from 47 samples across five related genera of Myrtaceae. The average percentage of reads mapped back to the reference was 57.6% with coverage of more than 20 reads per position across 83.5% of the data. The methods developed here should be applicable across a large range of taxa across all kingdoms. The core methods are very flexible, providing a platform for various genomic resource availabilities and are useful from shallow to deep phylogenies

    Methods for Viral Intra-Host and Inter-Host Data Analysis for Next-Generation Sequencing Technologies

    Get PDF
    The deep coverage offered by next-generation sequencing (NGS) technology has facilitated the reconstruction of intra-host RNA viral populations at an unprecedented level of detail. However, NGS data requires sophisticated analysis dealing with millions of error-prone short reads. This dissertation will first review the challenges and methods for viral NGS genomic data analysis in the NGS era. Second, it presents a software tool CliqueSNV for inferring viral quasispecies based on extracting pairs of statistically linked mutations from noisy reads, which effectively reduces sequencing noise and enables identifying minority haplotypes with a frequency below the sequencing error rate. Finally, the dissertation describes algorithms VOICE and MinDistB for inference of relatedness between viral samples, identification of transmission clusters, and sources of infection

    The challenges of defining the human nasopharyngeal resistome

    Get PDF
    The nasopharynx is an important microbial reservoir for the emergence and spread of antibiotic-resistant organisms. The nasopharyngeal resistome is an extensive, adaptable reservoir of antibiotic-resistance genes (ARGs) within this niche. Metagenomic sequencing decodes the genetic material of all organisms within a sample using next-generation technologies, permitting unbiased discovery of novel ARGs and associated mobile genetic elements (MGEs). The challenges of sequencing a low-biomass bacterial sample have limited exploration of the nasopharyngeal resistome. Here, we explore the current understanding of the nasopharyngeal resistome, particularly the role of MGEs in propagating antimicrobial resistance (AMR), explore the advantages and limitations of metagenomic sequencing technologies and bioinformatic pipelines for nasopharyngeal resistome analysis, and highlight the key outstanding questions for future research
    • …