241 research outputs found

    Single haplotype assembly of the human genome from a hydatidiform mole

    Get PDF
    A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    Parameters for accurate genome alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.</p> <p>Results</p> <p>We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.</p> <p>Conclusions</p> <p>These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours <url>http://last.cbrc.jp/</url>.</p

    Identification of methylated deoxyadenosines in vertebrates reveals diversity in DNA modifications.

    Get PDF
    Methylation of cytosine deoxynucleotides generates 5-methylcytosine (m(5)dC), a well-established epigenetic mark. However, in higher eukaryotes much less is known about modifications affecting other deoxynucleotides. Here, we report the detection of N(6)-methyldeoxyadenosine (m(6)dA) in vertebrate DNA, specifically in Xenopus laevis but also in other species including mouse and human. Our methylome analysis reveals that m(6)dA is widely distributed across the eukaryotic genome and is present in different cell types but is commonly depleted from gene exons. Thus, direct DNA modifications might be more widespread than previously thought.M.J.K. was supported by the Long-Term Human Frontiers Fellowship (LT000149/2010-L), the Medical Research Council grant (G1001690), and by the Isaac Newton Trust Fellowship (R G76588). The work was sponsored by the Biotechnology and Biological Sciences Research Council grant BB/M022994/1 (J.B.G. and M.J.K.). The Gurdon laboratory is funded by the grant 101050/Z/13/Z (J.B.G.) from the Wellcome Trust, and is supported by the Gurdon Institute core grants, namely by the Wellcome Trust Core Grant (092096/Z/10/Z) and by the Cancer Research UK Grant (C6946/A14492). C.R.B. and G.E.A. are funded by the Wellcome Trust Core Grant. We are grateful to D. Simpson and R. Jones-Green for preparing X. laevis eggs and oocytes, F. Miller for providing us with M. musculus tissue, T. Dyl for X. laevis eggs and D. rerio samples, and to Gurdon laboratory members for their critical comments. We thank U. Ruether for providing us with M. musculus kidney DNA (Entwicklungs- und Molekularbiologie der Tiere, Heinrich Heine Universitaet Duesseldorf, Germany). We also thank J. Ahringer, S. Jackson, A. Bannister and T. Kouzarides for critical input and advice, M. Sciacovelli and E. Gaude for suggestions.This is the author accepted manuscript. The final version is available from Nature Publishing Group via http://dx.doi.org/10.1038/nsmb.314

    Abundant Degenerate Miniature Inverted-Repeat Transposable Elements in Genomes of Epichloid Fungal Endophytes of Grasses

    Get PDF
    Miniature inverted-repeat transposable elements (MITEs) are abundant repeat elements in plant and animal genomes; however, there are few analyses of these elements in fungal genomes. Analysis of the draft genome sequence of the fungal endophyte Epichloë festucae revealed 13 MITE families that make up almost 1% of the E. festucae genome, and relics of putative autonomous parent elements were identified for three families. Sequence and DNA hybridization analyses suggest that at least some of the MITEs identified in the study were active early in the evolution of Epichloë but are not found in closely related genera. Analysis of MITE integration sites showed that these elements have a moderate integration site preference for 5′ genic regions of the E. festucae genome and are particularly enriched near genes for secondary metabolism. Copies of the EFT-3m/Toru element appear to have mediated recombination events that may have abolished synthesis of two fungal alkaloids in different epichloae. This work provides insight into the potential impact of MITEs on epichloae evolution and provides a foundation for analysis in other fungal genomes

    Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Variation within individual genomes ranges from single nucleotide polymorphisms (SNPs) to kilobase, and even megabase, sized structural variants (SVs), such as deletions, insertions, inversions, and more complex rearrangements. Although much is known about the extent of SVs in humans and mice, species in which they exert significant effects on phenotypes, very little is known about the extent of SVs in the 2.5-times smaller and less repetitive genome of the chicken.</p> <p>Results</p> <p>We identified hundreds of shared and divergent SVs in four commercial chicken lines relative to the reference chicken genome. The majority of SVs were found in intronic and intergenic regions, and we also found SVs in the coding regions. To identify the SVs, we combined high-throughput short read paired-end sequencing of genomic reduced representation libraries (RRLs) of pooled samples from 25 individuals and computational mapping of DNA sequences from a reference genome.</p> <p>Conclusion</p> <p>We provide a first glimpse of the high abundance of small structural genomic variations in the chicken. Extrapolating our results, we estimate that there are thousands of rearrangements in the chicken genome, the majority of which are located in non-coding regions. We observed that structural variation contributes to genetic differentiation among current domesticated chicken breeds and the Red Jungle Fowl. We expect that, because of their high abundance, SVs might explain phenotypic differences and play a role in the evolution of the chicken genome. Finally, our study exemplifies an efficient and cost-effective approach for identifying structural variation in sequenced genomes.</p

    A Snapshot of CNVs in the Pig Genome

    Get PDF
    Recent studies of mammalian genomes have uncovered the extent of copy number variation (CNV) that contributes to phenotypic diversity, including health and disease status. Here we report a first account of CNVs in the pig genome covering part of the chromosomes 4, 7, 14, and 17 already sequenced and assembled. A custom tiling oligonucleotide array was used with a median probe spacing of 409 bp for screening 12 unrelated Duroc boars that are founders of a large family material. After a strict CNV calling pipeline, 37 copy number variable regions (CNVRs) across all four chromosomes were identified, with five CNVRs overlapping segmental duplications, three overlapping pig unigenes and one overlapping a RefSeq pig mRNA. This CNV snapshot analysis is the first of its kind in the porcine genome and constitutes the basis for a better understanding of porcine phenotypes and genotypes with the prospect of identifying important economic traits

    The Application of DNA Barcodes for the Identification of Marine Crustaceans from the North Sea and Adjacent Regions

    Get PDF
    During the last years DNA barcoding has become a popular method of choice for molecular specimen identification. Here we present a comprehensive DNA barcode library of various crustacean taxa found in the North Sea, one of the most extensively studied marine regions of the world. Our data set includes 1,332 barcodes covering 205 species, including taxa of the Amphipoda, Copepoda, Decapoda, Isopoda, Thecostraca, and others. This dataset represents the most extensive DNA barcode library of the Crustacea in terms of species number to date. By using the Barcode of Life Data Systems (BOLD), unique BINs were identified for 198 (96.6%) of the analyzed species. Six species were characterized by two BINs (2.9%), and three BINs were found for the amphipod species Gammarus salinus Spooner, 1947 (0.4%). Intraspecific distances with values higher than 2.2% were revealed for 13 species (6.3%). Exceptionally high distances of up to 14.87% between two distinct but monophyletic clusters were found for the parasitic copepod Caligus elongatus Nordmann, 1832, supporting the results of previous studies that indicated the existence of an overlooked sea louse species. In contrast to these high distances, haplotype-sharing was observed for two decapod spider crab species, Macropodia parva Van Noort & Adema, 1985 and Macropodia rostrata (Linnaeus, 1761), underlining the need for a taxonomic revision of both species. Summarizing the results, our study confirms the application of DNA barcodes as highly effective identification system for the analyzed marine crustaceans of the North Sea and represents an important milestone for modern biodiversity assessment studies using barcode sequence

    Cryptic species in a well-known habitat: applying taxonomics to the amphipod genus Epimeria (Crustacea, Peracarida)

    Get PDF
    Taxonomy plays a central role in biological sciences. It provides a communication system for scientists as it aims to enable correct identification of the studied organisms. As a consequence, species descriptions should seek to include as much available information as possible at species level to follow an integrative concept of ‘taxonomics’. Here, we describe the cryptic species Epimeria frankei sp. nov. from the North Sea, and also redescribe its sister species, Epimeria cornigera. The morphological information obtained is substantiated by DNA barcodes and complete nuclear 18S rRNA gene sequences. In addition, we provide, for the first time, full mitochondrial genome data as part of a metazoan species description for a holotype, as well as the neotype. This study represents the first successful implementation of the recently proposed concept of taxonomics, using data from highthroughput technologies for integrative taxonomic studies, allowing the highest level of confidence for both biodiversity and ecological research
    corecore