275 research outputs found

    Single haplotype assembly of the human genome from a hydatidiform mole

    Get PDF
    A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly

    "Head-to-head" and "tail-to-tail" 180-degree domain walls in an isolated ferroelectric

    Full text link
    "Head-to-head" and "tail-to-tail" 180-degree domain-walls in a finite isolated ferroelectric sample are theoretically studied using Landau theory. The full set of equations, suitable for numerical calculations is developed. The explicit expressions for the polarization profile across the walls are derived for several limiting cases and wall-widths are estimated. It is shown analytically that different regimes of screening and different dependences for width of charged domain walls on the temperature and parameters of the system are possible, depending on spontaneous polarization and concentration of carriers in the material. It is shown that the half-width of charged domain walls in typical perovskites is about the nonlinear Thomas-Fermi screening-length and about one order of magnitude larger than the half-width of neutral domain-walls. The formation energies of "head-to-head" walls under different regimes of screening are obtained, neglecting the poling ability of the surface. It is shown that either "head-to-head" or "tail-to-tail" configuration can be energetically favorable in comparison with the monodomain state of the ferroelectric if the poling ability of the surface is large enough. If this is not the case, the existence of charged domain walls in bulk ferroelectrics is merely a result of the domain-growth kinetics. Size-effect corresponding to the competition between state with charged domain wall, single domain state, multidomain state, and the state with the zero polarization is considered. The results obtained for the case of an isolated ferroelectric sample were compared with the results for an electroded sample. It was shown that charged domain wall in electroded sample can be either metastable or stable, depends on the work function difference between electrodes and ferroelectric and the poling ability of the electrode/ferroelectric interface.Comment: 47 pages, 10 figure

    Parameters for accurate genome alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.</p> <p>Results</p> <p>We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.</p> <p>Conclusions</p> <p>These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours <url>http://last.cbrc.jp/</url>.</p

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    Identification of methylated deoxyadenosines in vertebrates reveals diversity in DNA modifications.

    Get PDF
    Methylation of cytosine deoxynucleotides generates 5-methylcytosine (m(5)dC), a well-established epigenetic mark. However, in higher eukaryotes much less is known about modifications affecting other deoxynucleotides. Here, we report the detection of N(6)-methyldeoxyadenosine (m(6)dA) in vertebrate DNA, specifically in Xenopus laevis but also in other species including mouse and human. Our methylome analysis reveals that m(6)dA is widely distributed across the eukaryotic genome and is present in different cell types but is commonly depleted from gene exons. Thus, direct DNA modifications might be more widespread than previously thought.M.J.K. was supported by the Long-Term Human Frontiers Fellowship (LT000149/2010-L), the Medical Research Council grant (G1001690), and by the Isaac Newton Trust Fellowship (R G76588). The work was sponsored by the Biotechnology and Biological Sciences Research Council grant BB/M022994/1 (J.B.G. and M.J.K.). The Gurdon laboratory is funded by the grant 101050/Z/13/Z (J.B.G.) from the Wellcome Trust, and is supported by the Gurdon Institute core grants, namely by the Wellcome Trust Core Grant (092096/Z/10/Z) and by the Cancer Research UK Grant (C6946/A14492). C.R.B. and G.E.A. are funded by the Wellcome Trust Core Grant. We are grateful to D. Simpson and R. Jones-Green for preparing X. laevis eggs and oocytes, F. Miller for providing us with M. musculus tissue, T. Dyl for X. laevis eggs and D. rerio samples, and to Gurdon laboratory members for their critical comments. We thank U. Ruether for providing us with M. musculus kidney DNA (Entwicklungs- und Molekularbiologie der Tiere, Heinrich Heine Universitaet Duesseldorf, Germany). We also thank J. Ahringer, S. Jackson, A. Bannister and T. Kouzarides for critical input and advice, M. Sciacovelli and E. Gaude for suggestions.This is the author accepted manuscript. The final version is available from Nature Publishing Group via http://dx.doi.org/10.1038/nsmb.314

    Metagenomic identification of severe pneumonia pathogens in mechanically-ventilated patients:a feasibility and clinical validity study

    Get PDF
    BACKGROUND: Metagenomic sequencing of respiratory microbial communities for pathogen identification in pneumonia may help overcome the limitations of culture-based methods. We examined the feasibility and clinical validity of rapid-turnaround metagenomics with Nanopore™ sequencing of clinical respiratory specimens. METHODS: We conducted a case-control study of mechanically-ventilated patients with pneumonia (nine culture-positive and five culture-negative) and without pneumonia (eight controls). We collected endotracheal aspirates and applied a microbial DNA enrichment method prior to metagenomic sequencing with the Oxford Nanopore MinION device. For reference, we compared Nanopore results against clinical microbiologic cultures and bacterial 16S rRNA gene sequencing. RESULTS: Human DNA depletion enabled in depth sequencing of microbial communities. In culture-positive cases, Nanopore revealed communities with high abundance of the bacterial or fungal species isolated by cultures. In four cases with resistant clinical isolates, Nanopore detected antibiotic resistance genes corresponding to the phenotypic resistance in antibiograms. In culture-negative pneumonia, Nanopore revealed probable bacterial pathogens in 1/5 cases and Candida colonization in 3/5 cases. In controls, Nanopore showed high abundance of oral bacteria in 5/8 subjects, and identified colonizing respiratory pathogens in other subjects. Nanopore and 16S sequencing showed excellent concordance for the most abundant bacterial taxa. CONCLUSIONS: We demonstrated technical feasibility and proof-of-concept clinical validity of Nanopore metagenomics for severe pneumonia diagnosis, with striking concordance with positive microbiologic cultures, and clinically actionable information obtained from sequencing in culture-negative samples. Prospective studies with real-time metagenomics are warranted to examine the impact on antimicrobial decision-making and clinical outcomes

    Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

    Get PDF
    High-throughput sequencing technologies have strongly impacted microbiology, providing a rapid and cost-effective way of generating draft genomes and exploring microbial diversity. However, sequences obtained from impure nucleic acid preparations may contain DNA from sources other than the sample. Those sequence contaminations are a serious concern to the quality of the data used for downstream analysis, causing misassembly of sequence contigs and erroneous conclusions. Therefore, the removal of sequence contaminants is a necessary and required step for all sequencing projects. We developed DeconSeq, a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length). DeconSeq is publicly available as standalone and web-based versions. The results can be exported for subsequent analysis, and the databases used for the web-based version are automatically updated on a regular basis. DeconSeq categorizes possible contamination sequences, eliminates redundant hits with higher similarity to non-contaminant genomes, and provides graphical visualizations of the alignment results and classifications. Using DeconSeq, we conducted an analysis of possible human DNA contamination in 202 previously published microbial and viral metagenomes and found possible contamination in 145 (72%) metagenomes with as high as 64% contaminating sequences. This new framework allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods. DeconSeq's web interface is simple and user-friendly. The standalone version allows offline analysis and integration into existing data processing pipelines. DeconSeq's results reveal whether the sequencing experiment has succeeded, whether the correct sample was sequenced, and whether the sample contains any sequence contamination from DNA preparation or host. In addition, the analysis of 202 metagenomes demonstrated significant contamination of the non-human associated metagenomes, suggesting that this method is appropriate for screening all metagenomes. DeconSeq is available at http://deconseq.sourceforge.net/

    Novel Bacterial Taxa in the Human Microbiome

    Get PDF
    The human gut harbors thousands of bacterial taxa. A profusion of metagenomic sequence data has been generated from human stool samples in the last few years, raising the question of whether more taxa remain to be identified. We assessed metagenomic data generated by the Human Microbiome Project Consortium to determine if novel taxa remain to be discovered in stool samples from healthy individuals. To do this, we established a rigorous bioinformatics pipeline that uses sequence data from multiple platforms (Illumina GAIIX and Roche 454 FLX Titanium) and approaches (whole-genome shotgun and 16S rDNA amplicons) to validate novel taxa. We applied this approach to stool samples from 11 healthy subjects collected as part of the Human Microbiome Project. We discovered several low-abundance, novel bacterial taxa, which span three major phyla in the bacterial tree of life. We determined that these taxa are present in a larger set of Human Microbiome Project subjects and are found in two sampling sites (Houston and St. Louis). We show that the number of false-positive novel sequences (primarily chimeric sequences) would have been two orders of magnitude higher than the true number of novel taxa without validation using multiple datasets, highlighting the importance of establishing rigorous standards for the identification of novel taxa in metagenomic data. The majority of novel sequences are related to the recently discovered genus Barnesiella, further encouraging efforts to characterize the members of this genus and to study their roles in the microbial communities of the gut. A better understanding of the effects of less-abundant bacteria is important as we seek to understand the complex gut microbiome in healthy individuals and link changes in the microbiome to disease

    A Snapshot of CNVs in the Pig Genome

    Get PDF
    Recent studies of mammalian genomes have uncovered the extent of copy number variation (CNV) that contributes to phenotypic diversity, including health and disease status. Here we report a first account of CNVs in the pig genome covering part of the chromosomes 4, 7, 14, and 17 already sequenced and assembled. A custom tiling oligonucleotide array was used with a median probe spacing of 409 bp for screening 12 unrelated Duroc boars that are founders of a large family material. After a strict CNV calling pipeline, 37 copy number variable regions (CNVRs) across all four chromosomes were identified, with five CNVRs overlapping segmental duplications, three overlapping pig unigenes and one overlapping a RefSeq pig mRNA. This CNV snapshot analysis is the first of its kind in the porcine genome and constitutes the basis for a better understanding of porcine phenotypes and genotypes with the prospect of identifying important economic traits
    corecore