18 research outputs found

    Fast and Sensitive Genome-Hashing Software and its Application in Using NGS as a Detection Agent for Bacterial Presence in Oral Metagenomic Samples

    Get PDF
    Next generation sequencing has increased the throughput of sequenced DNA into the range of billions of nucleotides sequenced per day. With the increased speed of DNA sequencing and the short length of reads produced by next generation sequencers, a significant challenge has been created in quickly and accurately assembling the hundreds of millions of short reads created by modern sequencing instruments into their full genomic sequences. With the increase in throughput in next generation sequencing and the decrease in time and cost to perform DNA sequencing, novel applications for DNA sequencing are being considered. Among them is a methodology by which DNA sequencing can be used as a diagnostic or detection tool for bacterial infection or presence. Here, the implementation, characteristics, and deployment of a novel, genome-hashing alignment algorithm for quickly performing reference-based alignment is described. This algorithm, SRmapper, is shown to be between two-fold to eight-fold faster than a current and popular alignment algorithm, BWA, while retaining a similar fraction of reads aligned to human reference genome. SRmapper demonstrates a capability to align approximately 150 billion nucleotides per processor day on an Intel Xeon 2.8GHz processor to the human genome while using approximately 2.5GB of RAM. SRmapper is demonstrated to be able to perform both single-end and pair-end alignment and tolerates a higher number of discrepancies between reads and the reference sequence than BWA. Using SRmapper as an alignment tool, a method to detect Mycobacterium tuberculosis (TB) in metagenomic samples containing many different bacteria is described. This method utilizes the construction of a novel uniqueness genome for TB containing only the regions of the TB genome not similar to any other bacterial species in the oral metagenome. Alignment of simulated and real metagenomic samples demonstrate the effectiveness of the uniqueness genome in the detection of TB and discover TB contamination in samples from the 1000 genomes project. Finally, the uniqueness genomes methodology is expanded to all genomes within the oral metagenome, and preliminary evidence is provided demonstrating that next generation sequencing can detect the presence of multiple simultaneously via alignment using SRmapper

    Tissue-specific usage of transposable element-derived promoters in mouse development

    Get PDF
    BACKGROUND: Transposable elements (TEs) are a significant component of eukaryotic genomes and play essential roles in genome evolution. Mounting evidence indicates that TEs are highly transcribed in early embryo development and contribute to distinct biological functions and tissue morphology. RESULTS: We examine the epigenetic dynamics of mouse TEs during the development of five tissues: intestine, liver, lung, stomach, and kidney. We found that TEs are associated with over 20% of open chromatin regions during development. Close to half of these accessible TEs are only activated in a single tissue and a specific developmental stage. Most accessible TEs are rodent-specific. Across these five tissues, 453 accessible TEs are found to create the transcription start sites of downstream genes in mouse, including 117 protein-coding genes and 144 lincRNA genes, 93.7% of which are mouse-specific. Species-specific TE-derived transcription start sites are found to drive the expression of tissue-specific genes and change their tissue-specific expression patterns during evolution. CONCLUSION: Our results suggest that TE insertions increase the regulatory potential of the genome, and some TEs have been domesticated to become a crucial component of gene and regulate tissue-specific expression during mouse tissue development

    A genetic screen for regulators of muscle morphogenesis in Drosophila

    Get PDF
    The mechanisms that determine the final topology of skeletal muscles remain largely unknown. We have been developing Drosophila body wall musculature as a model to identify and characterize the pathways that control muscle size, shape, and orientation during embryogenesis (Johnson et al., 2013; Williams et al., 2015; Yang et al., 2020a; Yang et al., 2020b). Our working model argues muscle morphogenesis is regulated by (1) extracellular guidance cues that direct muscle cells toward muscle attachment sites, and (2) contact dependent interactions between muscles and tendon cells. While we have identified several pathways that regulate muscle morphogenesis, our understanding is far from complete. Here we report the results of a recent EMS-based forward genetic screen that identified a myriad of loci not previously associated with muscle morphogenesis. We recovered new alleles of known muscle morphogenesis genes, including back seat driver, kon-tiki, thisbe, and tumbleweed, arguing our screen had the depth and precision to uncover myogenic genes. We also identified new alleles of spalt-major, barren, and patched that presumably disrupt independent muscle morphogenesis pathways. Equally as important, our screen shows that at least 11 morphogenetic loci remain to be mapped and characterized. Our screen has developed exciting new tools to study muscle morphogenesis, which may provide future insights into the mechanisms that regulate skeletal muscle topology

    AIAP: A quality control and integrative analysis package to improve ATAC-seq data analysis

    Get PDF
    Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) is a technique widely used to investigate genome-wide chromatin accessibility. The recently published Omni-ATAC-seq protocol substantially improves the signal/noise ratio and reduces the input cell number. High-quality data are critical to ensure accurate analysis. Several tools have been developed for assessing sequencing quality and insertion size distribution for ATAC-seq data; however, key quality control (QC) metrics have not yet been established to accurately determine the quality of ATAC-seq data. Here, we optimized the analysis strategy for ATAC-seq and defined a series of QC metrics for ATAC-seq data, including reads under peak ratio (RUPr), background (BG), promoter enrichment (ProEn), subsampling enrichment (SubEn), and other measurements. We incorporated these QC tests into our recently developed ATAC-seq Integrative Analysis Package (AIAP) to provide a complete ATAC-seq analysis system, including quality assurance, improved peak calling, and downstream differential analysis. We demonstrated a significant improvement of sensitivity (20%-60%) in both peak calling and differential analysis by processing paired-end ATAC-seq datasets using AIAP. AIAP is compiled into Docker/Singularity, and it can be executed by one command line to generate a comprehensive QC report. We used ENCODE ATAC-seq data to benchmark and generate QC recommendations, and developed qATACViewer for the user-friendly interaction with the QC report. The software, source code, and documentation of AIAP are freely available at https://github.com/Zhang-lab/ATAC-seq_QC_analysis

    FeatSNP: An Interactive Database for Brain-Specific Epigenetic Annotation of Human SNPs

    Get PDF
    FeatSNP is an online tool and a curated database for exploring 81 million common SNPs’ potential functional impact on the human brain. FeatSNP uses the brain transcriptomes of the human population to improve functional annotation of human SNPs by integrating transcription factor binding prediction, public eQTL information, and brain specific epigenetic landscape, as well as information of Topologically Associating Domains (TADs). FeatSNP supports both single and batched SNP searching, and its interactive user interface enables users to explore the functional annotations and generate publication-quality visualization results. FeatSNP is freely available on the internet at FeatSNP.org with all major web browsers supported

    Cellular and molecular characterization of multiplex autism in human induced pluripotent stem cell-derived neurons

    Get PDF
    Background: Autism spectrum disorder (ASD) is a neurodevelopmental disorder with pronounced heritability in the general population. This is largely attributable to the effects of polygenic susceptibility, with inherited liability exhibiting distinct sex differences in phenotypic expression. Attempts to model ASD in human cellular systems have principally involved rare de novo mutations associated with ASD phenocopies. However, by definition, these models are not representative of polygenic liability, which accounts for the vast share of population-attributable risk. Methods: Here, we performed what is, to our knowledge, the first attempt to model multiplex autism using patient-derived induced pluripotent stem cells (iPSCs) in a family manifesting incremental degrees of phenotypic expression of inherited liability (absent, intermediate, severe). The family members share an inherited variant of uncertain significance (VUS) in Results: cExN neurospheres from the two affected individuals were reduced in size, compared to those derived from unaffected related and unrelated individuals. This reduction was, at least in part, due to increased apoptosis of cells from affected individuals upon initiation of cExN neural induction. Likewise, cIN neural progenitor cells from affected individuals exhibited increased apoptosis, compared to both unaffected individuals. Transcriptomic analysis of both cExN and cIN neural progenitor cells revealed distinct molecular signatures associated with affectation, including the misregulation of suites of genes associated with neural development, neuronal function, and behavior, as well as altered expression of ASD risk-associated genes. Conclusions: We have provided evidence of morphological, physiological, and transcriptomic signatures of polygenic liability to ASD from an analysis of cellular models derived from a multiplex autism family. ASD is commonly inherited on the basis of additive genetic liability. Therefore, identifying convergent cellular and molecular phenotypes resulting from polygenic and monogenic susceptibility may provide a critical bridge for determining which of the disparate effects of rare highly deleterious mutations might also apply to common autistic syndromes

    Altered neuronal physiology, development, and function associated with a common chromosome 15 duplication involving CHRNA7

    Get PDF
    BACKGROUND: Copy number variants (CNVs) linked to genes involved in nervous system development or function are often associated with neuropsychiatric disease. While CNVs involving deletions generally cause severe and highly penetrant patient phenotypes, CNVs leading to duplications tend instead to exhibit widely variable and less penetrant phenotypic expressivity among affected individuals. CNVs located on chromosome 15q13.3 affecting the alpha-7 nicotinic acetylcholine receptor subunit (CHRNA7) gene contribute to multiple neuropsychiatric disorders with highly variable penetrance. However, the basis of such differential penetrance remains uncharacterized. Here, we generated induced pluripotent stem cell (iPSC) models from first-degree relatives with a 15q13.3 duplication and analyzed their cellular phenotypes to uncover a basis for the dissimilar phenotypic expressivity. RESULTS: The first-degree relatives studied included a boy with autism and emotional dysregulation (the affected proband-AP) and his clinically unaffected mother (UM), with comparison to unrelated control models lacking this duplication. Potential contributors to neuropsychiatric impairment were modeled in iPSC-derived cortical excitatory and inhibitory neurons. The AP-derived model uniquely exhibited disruptions of cellular physiology and neurodevelopment not observed in either the UM or unrelated controls. These included enhanced neural progenitor proliferation but impaired neuronal differentiation, maturation, and migration, and increased endoplasmic reticulum (ER) stress. Both the neuronal migration deficit and elevated ER stress could be selectively rescued by different pharmacologic agents. Neuronal gene expression was also dysregulated in the AP, including reduced expression of genes related to behavior, psychological disorders, neuritogenesis, neuronal migration, and Wnt, axonal guidance, and GABA receptor signaling. The UM model instead exhibited upregulated expression of genes in many of these same pathways, suggesting that molecular compensation could have contributed to the lack of neurodevelopmental phenotypes in this model. However, both AP- and UM-derived neurons exhibited shared alterations of neuronal function, including increased action potential firing and elevated cholinergic activity, consistent with increased homomeric CHRNA7 channel activity. CONCLUSIONS: These data define both diagnosis-associated cellular phenotypes and shared functional anomalies related to CHRNA7 duplication that may contribute to variable phenotypic penetrance in individuals with 15q13.3 duplication. The capacity for pharmacological agents to rescue some neurodevelopmental anomalies associated with diagnosis suggests avenues for intervention for carriers of this duplication and other CNVs that cause related disorders
    corecore