6 research outputs found

    Read Annotation Pipeline for High-Throughput Sequencing Data

    Get PDF
    Mapping reads to a reference sequence is a common step when analyzing allele effects in high throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending the genetic distances of the target sequences from the reference. To avoid this bias researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings, and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. These authors contributed equally to this work

    Analysis of subcellular RNA fractions demonstrates significant genetic regulation of gene expression in human brain post-transcriptionally

    Get PDF
    Gaining insight into the genetic regulation of gene expression in human brain is key to the interpretation of genome-wide association studies for major neurological and neuropsychiatric diseases. Expression quantitative trait loci (eQTL) analyses have largely been used to achieve this, providing valuable insights into the genetic regulation of steady-state RNA in human brain, but not distinguishing between molecular processes regulating transcription and stability. RNA quantification within cellular fractions can disentangle these processes in cell types and tissues which are challenging to model in vitro. We investigated the underlying molecular processes driving the genetic regulation of gene expression specific to a cellular fraction using allele-specific expression (ASE). Applying ASE analysis to genomic and transcriptomic data from paired nuclear and cytoplasmic fractions of anterior prefrontal cortex, cerebellar cortex and putamen tissues from 4 post-mortem neuropathologically-confirmed control human brains, we demonstrate that a significant proportion of genetic regulation of gene expression occurs post-transcriptionally in the cytoplasm, with genes undergoing this form of regulation more likely to be synaptic. These findings have implications for understanding the structure of gene expression regulation in human brain, and importantly the interpretation of rapidly growing single-nucleus brain RNA-sequencing and eQTL datasets, where cytoplasm-specific regulatory events could be missed

    Correcting Reference Bias in High-throughput Sequencing Analysis

    Get PDF
    Mapping reads to a reference sequence is a common step when analyzing high throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending the genetic distances of the target sequences from the reference. To avoid this bias researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings, and the selection of which variants to include to remove biases. To address these issues, I proposed novel and generic pipelines that integrate the genomic variations from known or suspected founders into reference sequences and then perform read alignment. Experiments show that my pipelines can align more reads with much lower reference bias than the traditional pipeline where reads are mapped against the standard reference sequence. They can be applied to a wide range of organisms, including inbreds, F1s, and outbreds, and various high throughput sequencing approaches, such as RNAseq, DNAseq, ChiPseq, etc.Doctor of Philosoph

    EXTREME HYBRID GROWTH, GENOMIC IMPRINTING, THE LARGE X EFFECT, AND THE DRIVERS OF SPECIATION IN MAMMALS

    Get PDF
    Mammalian hybrids often show abnormal growth, indicating that developmental inviability may play an important role in mammalian speciation. Yet it is unclear if this recurrent phenotype reflects a common genetic basis. Here I describe patterns of hybrid inviability between two closely related species of dwarf hamsters, Phodopus campbelli and P. sungorus. Using genetic crosses, I found extreme parent-of-origin dependent growth in hybrid embryos and placentas. Abnormal growth in hybrid mammals has been empirically linked to genomic imprinting, the parent-specific silencing of a single allele that occurs in many genes involved in regulating embryonic growth. Epigenetic disruptions of genomic imprinting activate transcription of the normally silenced allele and are thought to increase expression level. Higher expression of genes whose imprinting is disrupted may cause a dosage imbalance between growth factors and repressors, ultimately leading to abnormal embryonic growth. I next tested the general prediction that disrupted imprinting leads to an increased expression of growth promoting genes in large F1 hybrid hamsters from the genus Phodopus. I found that disrupted imprinting correlates strongly with placental growth and changes in the expression level of imprinted genes, but that widespread disruptions in the silencing of maternally-expressed genes associates with lower, not higher, gene expression. As maternally expressed genes tend to repress offspring growth, these data suggest that overgrowth is associated with a reduced level of growth repressors rather than an excess of growth factors. Asymmetric hybrid phenotypes imply a genetic basis that is uniparentally inherited, for example the X chromosome, mitochondria, and imprinted genes. Hybrid dwarf hamsters in the genus Phodopus exhibit extreme parent-of-origin growth of both placenta and embryos. Finally, I used a suite of genetic and genomic experiments test whether the X chromosome, the mitochondria, or imprinted genes are involved in parent-of-origin dependent growth in hybrid dwarf hamsters. I demonstrated a major role for the maternally inherited X chromosome, and widespread disruptions of expression of autosomal genes including imprinted genes but no influence of the mitochondria. My data suggest that an incompatible interaction involving the maternally inherited P. sungorus X chromosome and a paternally inherited P. campbelli autosomal element results in placental and embryonic overgrowth. Overgrowth is also correlated with a greatly reduced expression of maternally-expressed imprinted genes, though any connection between expression and the X chromosome remains unclear

    SPERMATOGENESIS MOLECULAR EVOLUTION IN MURINE RODENTS

    Get PDF
    Reproductive traits are fascinating from an evolutionary perspective because they are necessary for individuals to produce offspring and increase their evolutionary fitness. Given the essentiality of reproduction to fitness, genes involved in reproduction may be expected to be highly conserved. However, some genes involved in reproduction evolve very rapidly, including many spermatogenesis genes. This rapid evolution may result from intense sexual selection acting on reproductive traits, particularly in species where females mate multiply thus creating the potential for sperm competition. In addition to sexual selection, other evolutionary forces may shape rapid spermatogenesis evolution, including genomic conflict and relaxed pleiotropic constraint due to the high specificity of genes involved in spermatogenesis. It is unclear how these forces may interact, their relative importance in spermatogenesis molecular evolution, and how the intensity of these forces changes across spermatogenesis developmental stages. Rapid spermatogenesis evolution is thought to have important downstream consequences, including rapid phenotypic evolution of male reproductive traits and reproductive barriers that contribute to speciation. However, direct connections between molecular evolution, phenotypic evolution, and speciation have rarely been made for male reproductive traits. Thus, my dissertation seeks to understand what are the causes and consequences of rapid spermatogenesis molecular evolution? House mice (Mus musculus) and closely related species are an ideal system in which to address this question because they experience sperm competition, form natural hybrid zones and produce sterile hybrid males, readily breed and hybridize in the laboratory, and have extensive genomic resources available. Furthermore, house mice are part of the massive Murinae subfamily of rodents, which comprise over 10% of all mammal species and show remarkable variation in reproductive traits, including sperm morphology. Spermatogenesis is a complex developmental process, so understanding variation in the intensity of different evolutionary forces across spermatogenesis stages is critical to understanding spermatogenesis evolution. Fluorescenceactivated cell sorting is one way to generate enriched cell populations representing different spermatogenesis stages. In this dissertation, I use gene expression data from sorted cell populations in house mice, as well as genomic and phenotypic data from mice and other murine rodents to study mammalian spermatogenesis evolution. In Chapter 1, I use data from enriched cell populations representing two different spermatogenesis stages and four different species of mice to investigate the relative rates of molecular evolution across spermatogenesis and the types of mutations underlying gene expression evolution in different spermatogenesis stages. I show that lineage-specificity of genes expressed, gene expression level divergence, and protein sequence divergence all increase during the late stages of spermatogenesis. I also show that protein coding divergence, but not gene expression divergence, is higher on the X chromosome than the autosomes across spermatogenesis cell types. Lastly, I use published data from F1 mouse crosses to do allelespecific expression analyses and show that the types of regulatory mutations underlieing expression divergence are strikingly different between early and late spermatogenesis. This study provides insight into mammalian spermatogenesis molecular evolution and shows the importance of developmental context in molecular evolutionary studies. In Chapter 2, I perform two genetic experiments involving advanced-generation hybrid mouse crosses to explore hybrid incompatibilities on the sex chromosomes and their effects on hybrid male spermatogenesis expression and reproductive phenotypes. My results refute the hypothesis that genomic conflict between the sex chromosomes contributes to sex chromosome overexpression during late spermatogenesis in sterile mouse hybrids. However, they do show that incompatibilities between the X and Y chromosomes, between the Y chromosome and autosomes, or both likely contribute to male hybrid sterility in house mice. These findings advance our understanding of genetic incompatibilities contributing to male hybrid sterility, a common barrier to reproduction between species. In Chapter 3, I expand my research on spermatogenesis evolution to the Murinae subfamily, using exome capture and phenotype data to investigate the role of sexual selection in sperm morphological evolution and test for positive selection acting on male reproductive genes. My analyses indicate that relative testes mass is evolving indepently of phylogeny, and therefore may be evolving in response to sperm competition. Most Murinae sperm have a hook on the sperm head, and I show that hook length and angle are correlated with relative testes mass suggesting that these traits may also be selected on by sperm competition. Lastly, I find that genes expressed in rapidly evolving male reproductive tissues and spermatogenesis cell types, specifically seminal vesicles and postmeiotic spermatids, tend to experience more positive selection than other male reproductive genes, so their rapid evolution is likely due in part to positive selection. These findings contribute to our understanding of the underlieing causes of the rapid evolution of reproduction at both the phenotypic and molecular levels. In addition to these three chapters, I contributed to several related projects that address the overarching questions of my dissertation: a review on sex chromosome evolution in mammals in the context of spermatogenesis (Larson, et al. 2018), two methodological papers on quantifying sperm morphology (Skinner, et al. 2019a; Skinner, et al. 2019b), a peer-reviewed research article on disrupted X chromosome expression at different spermatogenesis stages in sterile house mouse hybrids (Larson, et al. 2021), and a study on X chromosome evolution in dwarf hamsters (Moore, et al. 2022). Collectively, my dissertation and related projects contribute to our understanding of reproduction and molecular evolution in mammals

    Collaborative Cross Graphical Genome

    Get PDF
    Reference genomes are the foundation of most bioinformatic pipelines. They are conventionally represented as a set of single-sequence assembled contigs, referred to as linear genomes. The rapid growth of sequencing technologies has driven the advent of pangenomes that integrate multiple genome assemblies in a single representation. Graphs are commonly used in pangenome models. However, there are challenges for graph-based pangenome representations and operations. This dissertation introduces methods for reference pangenome construction, genomic feature annotation, and tools for analyzing population-scale sequence data based on a graphical pangenome model. We first develop a genome registration tool for constructing a reference pangenome model by merging multiple linear genome assemblies and annotations into a graphical genome. Secondly, we develop a graph-based coordinate framework and discuss the strategies for referring to, annotating, and comparing genomic features in a graphical pangenome model. We demonstrate that the graph coordinate system simplifies assembly and annotation updates, identifying and segmenting updated sequences in a specific genomic region. Thirdly, we develop an alignment-free method to analyze population-scale sequence data based on a pangenome model. We demonstrate the application of our methods by constructing pangenome models for a mouse genetic reference population, Collaborative Cross. The pangenome framework proposed in this dissertation simplified the maintenance and management of massive genomic data and established a novel data structure for analyzing, visualizing, and comparing genomic features in an intra-specific population.Doctor of Philosoph
    corecore