    Diagnostic applications of next generation sequencing: working towards quality standards

    Over the past 6 years, next generation sequencing (NGS) has been established as a valuable high-throughput method for research in molecular genetics and has successfully been employed in the identification of rare and common genetic variations. All major NGS technology companies providing commercially available instruments (Roche 454, Illumina, Life Technologies) have recently marketed bench top sequencing instruments with lower throughput and shorter run times, thereby broadening the applications of NGS and opening the technology to the potential use for clinical diagnostics. Although the high expectations regarding the discovery of new diagnostic targets and an overall reduction of cost have been achieved, technological challenges in instrument handling, robustness of the chemistry and data analysis need to be overcome. To facilitate the implementation of NGS as a routine method in molecular diagnostics, consistent quality standards need to be developed. Here the authors give an overview of the current standards in protocols and workflows and discuss possible approaches to define quality criteria for NGS in molecular genetic diagnostics

    Genome sequencing of the extinct Eurasian wild aurochs, Bos primigenius, illuminates the phylogeography and evolution of cattle

    Background Domestication of the now-extinct wild aurochs, Bos primigenius, gave rise to the two major domestic extant cattle taxa, B. taurus and B. indicus. While previous genetic studies have shed some light on the evolutionary relationships between European aurochs and modern cattle, important questions remain unanswered, including the phylogenetic status of aurochs, whether gene flow from aurochs into early domestic populations occurred, and which genomic regions were subject to selection processes during and after domestication. Here, we address these questions using whole-genome sequencing data generated from an approximately 6,750-year-old British aurochs bone and genome sequence data from 81 additional cattle plus genome-wide single nucleotide polymorphism data from a diverse panel of 1,225 modern animals. Results Phylogenomic analyses place the aurochs as a distinct outgroup to the domestic B. taurus lineage, supporting the predominant Near Eastern origin of European cattle. Conversely, traditional British and Irish breeds share more genetic variants with this aurochs specimen than other European populations, supporting localized gene flow from aurochs into the ancestors of modern British and Irish cattle, perhaps through purposeful restocking by early herders in Britain. Finally, the functions of genes showing evidence for positive selection in B. taurus are enriched for neurobiology, growth, metabolism and immunobiology, suggesting that these biological processes have been important in the domestication of cattle. Conclusions This work provides important new information regarding the origins and functional evolution of modern cattle, revealing that the interface between early European domestic populations and wild aurochs was significantly more complex than previously thought

    Canfam GSD: De novo chromosome-length genome assembly of the German Shepherd Dog (Canis lupus familiaris) using a combination of long reads, optical mapping, and Hi-C

    Background: The German Shepherd Dog (GSD) is one of the most common breeds on earth and has been bred for its utility and intelligence. It is often first choice for police and military work, as well as protection, disability assistance, and search-and-rescue. Yet, GSDs are well known to be susceptible to a range of genetic diseases that can interfere with their training. Such diseases are of particular concern when they occur later in life, and fully trained animals are not able to continue their duties. Findings: Here, we provide the draft genome sequence of a healthy German Shepherd female as a reference for future disease and evolutionary studies. We generated this improved canid reference genome (CanFam GSD) utilizing a combination of Pacific Bioscience, Oxford Nanopore, 10X Genomics, Bionano, and Hi-C technologies. The GSD assembly is ∼80 times as contiguous as the current canid reference genome (20.9 vs 0.267 Mb contig N50), containing far fewer gaps (306 vs 23,876) and fewer scaffolds (429 vs 3,310) than the current canid reference genome CanFamv3.1. Two chromosomes (4 and 35) are assembled into single scaffolds with no gaps. BUSCO analyses of the genome assembly results show that 93.0% of the conserved single-copy genes are complete in the GSD assembly compared with 92.2% for CanFam v3.1. Homology-based gene annotation increases this value to ∼99%. Detailed examination of the evolutionarily important pancreatic amylase region reveals that there are most likely 7 copies of the gene, indicative of a duplication of 4 ancestral copies and the disruption of 1 copy. Conclusions: GSD genome assembly and annotation were produced with major improvement in completeness, continuity, and quality over the existing canid reference. This resource will enable further research related to canine diseases, the evolutionary relationships of canids, and other aspects of canid biology

    The Australian dingo is an early offshoot of modern breed dogs

    Dogs are uniquely associated with human dispersal and bring transformational insight into the domestication process. Dingoes represent an intriguing case within canine evolution being geographically isolated for thousands of years. Here, we present a high-quality de novo assembly of a pure dingo (CanFam_DDS). We identified large chromosomal differences relative to the current dog reference (CanFam3.1) and confirmed no expanded pancreatic amylase gene as found in breed dogs. Phylogenetic analyses using variant pairwise matrices show that the dingo is distinct from five breed dogs with 100% bootstrap support when using Greenland wolf as the outgroup. Functionally, we observe differences in methylation patterns between the dingo and German shepherd dog genomes and differences in serum biochemistry and microbiome makeup. Our results suggest that distinct demographic and environmental conditions have shaped the dingo genome. In contrast, artificial human selection has likely shaped the genomes of domestic breed dogs after divergence from the dingo

    Sequencing error correction without a reference genome

    Background: Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging. Results: We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered. Conclusions: The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms.Julie A Sleep, Andreas W Schreiber and Ute Bauman

    The efficacy of high-throughput sequencing and target enrichment on charred archaeobotanical remains

    The majority of archaeological plant material is preserved in a charred state. Obtaining reliable ancient DNA data from these remains has presented challenges due to high rates of nucleotide damage, short DNA fragment lengths, low endogenous DNA content and the potential for modern contamination. It has been suggested that high-throughput sequencing (HTS) technologies coupled with DNA enrichment techniques may overcome some of these limitations. Here we report the findings of HTS and target enrichment on four important archaeological crops (barley, grape, maize and rice) performed in three different laboratories, presenting the largest HTS assessment of charred archaeobotanical specimens to date. Rigorous analysis of our data-excluding false-positives due to background contamination or incorrect index assignments-indicated a lack of endogenous DNA in nearly all samples, except for one lightly-charred maize cob. Even with target enrichment, this sample failed to yield adequate data required to address fundamental questions in archaeology and biology. We further reanalysed part of an existing dataset on charred plant material, and found all purported endogenous DNA sequences were likely to be spurious. We suggest these technologies are not suitable for use with charred archaeobotanicals and urge great caution when interpreting data obtained by HTS of these remains

    Diversity of isoprene-degrading bacteria in phyllosphere and soil communities from a high isoprene-emitting environment: a Malaysian oil palm plantation

    Background: Isoprene is the most abundantly produced biogenic volatile organic compound (BVOC) on Earth, with annual global emissions almost equal to those of methane. Despite its importance in atmospheric chemistry and climate, little is known about the biological degradation of isoprene in the environment. The largest source of isoprene is terrestrial plants, and oil palms, the cultivation of which is expanding rapidly, are among the highest isoprene-producing trees. Results: DNA stable isotope probing (DNA-SIP) to study the microbial isoprene-degrading community associated with oil palm trees revealed novel genera of isoprene-utilising bacteria including Novosphingobium, Pelomonas, Rhodoblastus, Sphingomonas and Zoogloea in both oil palm soils and on leaves. Amplicon sequencing of isoA genes, which encode the α-subunit of the isoprene monooxygenase (IsoMO), a key enzyme in isoprene metabolism, confirmed that oil palm trees harbour a novel diversity of isoA sequences. In addition, metagenome assembled genomes (MAGs) were reconstructed from oil palm soil and leaf metagenomes and putative isoprene degradation genes were identified. Analysis of unenriched metagenomes showed that isoA-containing bacteria are more abundant in soils than in the oil palm phyllosphere. Conclusion: This study greatly expands the known diversity of bacteria that can metabolise isoprene and contributes to a better understanding of the biological degradation of this important but neglected climate-active gas

    Population genomics reveals that within-fungus polymorphism is common and maintained in populations of the mycorrhizal fungus Rhizophagus irregularis.

    Arbuscular mycorrhizal (AM) fungi are symbionts of most plants, increasing plant growth and diversity. The model AM fungus Rhizophagus irregularis (isolate DAOM 197198) exhibits low within-fungus polymorphism. In contrast, another study reported high within-fungus variability. Experiments with other R. irregularis isolates suggest that within-fungus genetic variation can affect the fungal phenotype and plant growth, highlighting the biological importance of such variation. We investigated whether there is evidence of differing levels of within-fungus polymorphism in an R. irregularis population. We genotyped 20 isolates using restriction site-associated DNA sequencing and developed novel approaches for characterizing polymorphism among haploid nuclei. All isolates exhibited higher within-isolate poly-allelic single-nucleotide polymorphism (SNP) densities than DAOM 197198 in repeated and non-repeated sites mapped to the reference genome. Poly-allelic SNPs were independently confirmed. Allele frequencies within isolates deviated from diploids or tetraploids, or that expected for a strict dikaryote. Phylogeny based on poly-allelic sites was robust and mirrored the standard phylogeny. This indicates that within-fungus genetic variation is maintained in AM fungal populations. Our results predict a heterokaryotic state in the population, considerable differences in copy number variation among isolates and divergence among the copies, or aneuploidy in some isolates. The variation may be a combination of all of these hypotheses. Within-isolate genetic variation in R. irregularis leads to large differences in plant growth. Therefore, characterizing genomic variation within AM fungal populations is of major ecological importance

    p53 Gene Repair with Zinc Finger Nucleases Optimised by Yeast 1-Hybrid and Validated by Solexa Sequencing

    The tumor suppressor gene p53 is mutated or deleted in over 50% of human tumors. As functional p53 plays a pivotal role in protecting against cancer development, several strategies for restoring wild-type (wt) p53 function have been investigated. In this study, we applied an approach using gene repair with zinc finger nucleases (ZFNs). We adapted a commercially-available yeast one-hybrid (Y1H) selection kit to allow rapid building and optimization of 4-finger constructs from randomized PCR libraries. We thus generated novel functional zinc finger nucleases against two DNA sites in the human p53 gene, near cancer mutation ‘hotspots’. The ZFNs were first validated using in vitro cleavage assays and in vivo episomal gene repair assays in HEK293T cells. Subsequently, the ZFNs were used to restore wt-p53 status in the SF268 human cancer cell line, via ZFN-induced homologous recombination. The frequency of gene repair and mutation by non-homologous end-joining was then ascertained in several cancer cell lines, using a deep sequencing strategy. Our Y1H system facilitates the generation and optimisation of novel, sequence-specific four- to six-finger peptides, and the p53-specific ZFN described here can be used to mutate or repair p53 in genomic loci

    Characterization in vitro and in vivo of a pandemic H1N1 influenza virus from a fatal case

    Pandemic 2009 H1N1 (pH1N1) influenza viruses caused mild symptoms in most infected patients. However, a greater rate of severe disease was observed in healthy young adults and children without co-morbid conditions. Here we tested whether influenza strains displaying differential virulence could be present among circulating pH1N1 viruses. The biological properties and the genotype of viruses isolated from a patient showing mild disease (M) or from a fatal case (F), both without known co-morbid conditions were compared in vitro and in vivo. The F virus presented faster growth kinetics and stronger induction of cytokines than M virus in human alveolar lung epithelial cells. In the murine model in vivo, the F virus showed a stronger morbidity and mortality than M virus. Remarkably, a higher proportion of mice presenting infectious virus in the hearts, was found in F virus-infected animals. Altogether, the data indicate that strains of pH1N1 virus with enhanced pathogenicity circulated during the 2009 pandemic. In addition, examination of chemokine receptor 5 (CCR5) genotype, recently reported as involved in severe influenza virus disease, revealed that the F virus-infected patient was homozygous for the deleted form of CCR5 receptor (CCR5Δ32).Funding Statement: This work was supported by Instituto de Salud Carlos III (Programa especial de investigación sobre la gripe pándemica GR09/0023, GR09/0040, GR09/0039) and Ciber de Enfermedades Respiratorias. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.S