44 research outputs found

    DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation.

    Get PDF
    BACKGROUND: Nanopore long-read sequencing technology greatly expands the capacity of long-range, single-molecule DNA-modification detection. A growing number of analytical tools have been developed to detect DNA methylation from nanopore sequencing reads. Here, we assess the performance of different methylation-calling tools to provide a systematic evaluation to guide researchers performing human epigenome-wide studies. RESULTS: We compare seven analytic tools for detecting DNA methylation from nanopore long-read sequencing data generated from human natural DNA at a whole-genome scale. We evaluate the per-read and per-site performance of CpG methylation prediction across different genomic contexts, CpG site coverage, and computational resources consumed by each tool. The seven tools exhibit different performances across the evaluation criteria. We show that the methylation prediction at regions with discordant DNA methylation patterns, intergenic regions, low CG density regions, and repetitive regions show room for improvement across all tools. Furthermore, we demonstrate that 5hmC levels at least partly contribute to the discrepancy between bisulfite and nanopore sequencing. Lastly, we provide an online DNA methylation database ( https://nanome.jax.org ) to display the DNA methylation levels detected by nanopore sequencing and bisulfite sequencing data across different genomic contexts. CONCLUSIONS: Our study is the first systematic benchmark of computational methods for detection of mammalian whole-genome DNA modifications in nanopore sequencing. We provide a broad foundation for cross-platform standardization and an evaluation of analytical tools designed for genome-scale modified base detection using nanopore sequencing

    Systemic Tissue and Cellular Disruption from SARS-CoV-2 Infection revealed in COVID- 19 Autopsies and Spatial Omics Tissue Maps

    Get PDF
    The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) virus has infected over 115 million people and caused over 2.5 million deaths worldwide. Yet, the molecular mechanisms underlying the clinical manifestations of COVID-19, as well as what distinguishes them from common seasonal influenza virus and other lung injury states such as Acute Respiratory Distress Syndrome (ARDS), remains poorly understood. To address these challenges, we combined transcriptional profiling of 646 clinical nasopharyngeal swabs and 39 patient autopsy tissues, matched with spatial protein and expression profiling (GeoMx) across 357 tissue sections. These results define both body-wide and tissue-specific (heart, liver, lung, kidney, and lymph nodes) damage wrought by the SARS-CoV-2 infection, evident as a function of varying viral load (high vs. low) during the course of infection and specific, transcriptional dysregulation in splicing isoforms, T cell receptor expression, and cellular expression states. In particular, cardiac and lung tissues revealed the largest degree of splicing isoform switching and cell expression state loss. Overall, these findings reveal a systemic disruption of cellular and transcriptional pathways from COVID-19 across all tissues, which can inform subsequent studies to combat the mortality of COVID-19, as well to better understand the molecular dynamics of lethal SARS-CoV-2 infection and other viruses

    The mitogenome of the bed bug Cimex lectularius (Hemiptera: Cimicidae)

    Full text link
    We report the extraction of a bed bug mitogenome from high-throughput sequencing projects originally focused on the nuclear genome of Cimex lectularius. The assembled mitogenome has a similar AT nucleotide composition bias found in other insects. Phylogenetic analysis of all protein-coding genes indicates that C. lectularius is clearly a member of a paraphyletic Cimicomorpha clade within the Order Hemiptera

    Assessing Reproducibility of Inherited Variants Detected With Short-Read Whole Genome Sequencing

    Get PDF
    Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when \u3e 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS

    Clonal Hematopoiesis Before, During, and After Human Spaceflight.

    Get PDF
    Clonal hematopoiesis (CH) occurs when blood cells harboring an advantageous mutation propagate faster than others. These mutations confer a risk for hematological cancers and cardiovascular disease. Here, we analyze CH in blood samples from a pair of twin astronauts over 4 years in bulk and fractionated cell populations using a targeted CH panel, linked-read whole-genome sequencing, and deep RNA sequencing. We show CH with distinct mutational profiles and increasing allelic fraction that includes a high-risk, TET2 clone in one subject and two DNMT3A mutations on distinct alleles in the other twin. These astronauts exhibit CH almost two decades prior to the mean age at which it is typically detected and show larger shifts in clone size than age-matched controls or radiotherapy patients, based on a longitudinal cohort of 157 cancer patients. As such, longitudinal monitoring of CH may serve as an important metric for overall cancer and cardiovascular risk in astronauts

    Assessing reproducibility of inherited variants detected with short-read whole genome sequencing

    Get PDF
    Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30x. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.Peer reviewe

    System-wide transcriptome damage and tissue identity loss in COVID-19 patients

    Get PDF
    The molecular mechanisms underlying the clinical manifestations of coronavirus disease 2019 (COVID-19), and what distinguishes them from common seasonal influenza virus and other lung injury states such as acute respiratory distress syndrome, remain poorly understood. To address these challenges, we combine transcriptional profiling of 646 clinical nasopharyngeal swabs and 39 patient autopsy tissues to define body-wide transcriptome changes in response to COVID-19. We then match these data with spatial protein and expression profiling across 357 tissue sections from 16 representative patient lung samples and identify tissue-compartment-specific damage wrought by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, evident as a function of varying viral loads during the clinical course of infection and tissue-type-specific expression states. Overall, our findings reveal a systemic disruption of canonical cellular and transcriptional pathways across all tissues, which can inform subsequent studies to combat the mortality of COVID-19 and to better understand the molecular dynamics of lethal SARS-CoV-2 and other respiratory infections., • Across all organs, fibroblast, and immune cell populations increase in COVID-19 patients • Organ-specific cell types and functional markers are lost in all COVID-19 tissue types • Lung compartment identity loss correlates with SARS-CoV-2 viral loads • COVID-19 uniquely disrupts co-occurrence cell type clusters (different from IAV/ARDS) , Park et al. report system-wide transcriptome damage and tissue identity loss wrought by SARS-CoV-2, influenza, and bacterial infection across multiple organs (heart, liver, lung, kidney, and lymph nodes) and provide a spatiotemporal landscape of COVID-19 in the lung

    Data for constructing insect genome content matrices for phylogenetic analysis and functional annotation

    Get PDF
    Twenty one fully sequenced and well annotated insect genomes were used to construct genome content matrices for phylogenetic analysis and functional annotation of insect genomes. To examine the role of e-value cutoff in ortholog determination we used scaled e-value cutoffs and a single linkage clustering approach.. The present communication includes (1) a list of the genomes used to construct the genome content phylogenetic matrices, (2) a nexus file with the data matrices used in phylogenetic analysis, (3) a nexus file with the Newick trees generated by phylogenetic analysis, (4) an excel file listing the Core (CORE) genes and Unique (UNI) genes found in five insect groups, and (5) a figure showing a plot of consistency index (CI) versus percent of unannotated genes that are apomorphies in the data set for gene losses and gains and bar plots of gains and losses for four consistency index (CI) cutoffs

    The Road To Cnidaria: History of Phylogeny of the Myxozoa

    No full text

    In silico hybridization enables transcriptomic illumination of the nature and evolution of Myxozoa

    No full text
    Abstract Background The Myxozoa, a group of oligocellular, obligate endoparasites, has long been poorly understood in an evolutionary context. Recent genome-level sequencing techniques such as RNA-seq have generated large amounts of myxozoan sequence data, providing valuable insight into their evolutionary history. However, sequences from host tissue contamination are present in next-generation sequencing reactions of myxozoan tissue, and differentiating between the two has been inadequately addressed. In order to shed light on the genetic underpinnings of myxozoan biology, assembled contigs generated from these studies that derived from the myxozoan must be decoupled from transcripts derived from host tissue and other contamination. This study describes a pipeline for categorization of transcripts asmyxozoan based on similarity searching with known host and parasite sequences, explores the extent to which host contamination is present in previously existing myxozoan datasets, and implements this pipeline on a newly sequenced transcriptome of Myxobolus pendula, a parasite of the common creek chub gill arch. Methods The insilico hybridization pipeline uses iterative BLAST searching and database-driven e-value comparison to categorize transcripts as deriving from host, parasite, or other contamination. Functional genetic analysis of M. pendula was conducted using further BLAST searching, Hidden Markov Modeling, and sequence alignment and phylogenetic reconstruction. Results Three RNA libraries of encysted M. pendula plasmodia were sequenced and subjected to the method. Nearly half of the final set of contiguous assembly sequences (47.3 %) was identified as putative myxozoan transcripts. Putative contamination was also identified in at least 1/3rd of previously published myxozoan transcripts. The set of M. pendula transcripts was mined for a range of biologically insightful genes, including taxonomically restricted nematocyst structural proteins and nematocyst proteins identified through mass tandem spectrometry of other cnidarians. Several novel findings emerged, including a fourth myxozoan minicollagen gene, putative myxozoan toxin proteins,and extracellular matrix glycoproteins. Conclusions This study serves as a model for the handling of next-generation myxozoan sequence. The need for careful categorization was demonstrated in both previous and new sets of myxozoan sequences. The final set of confidently assigned myxozoan transcripts can be mined for any biologically relevant gene or gene family without spurious misidentification of host contamination as a myxozoan homolog. As exemplified by M. pendula, the repertoire of myxozoan polar capsules may be more complex than previously thought, with an additional minicollagen homolog and putative expression of toxin proteins
    corecore