5 research outputs found
Genomic insights into fine-scale recombination variation in adaptively diverging threespine stickleback fish (Gasterosteus aculeatus)
Meiotic recombination is one of the major molecular mechanisms generating
genetic diversity and influencing genome evolution. By shuffling allelic
combinations, it can directly influence the patterns and efficacy of natural
selection. Studies in various organisms have shown that the rate and placement of
recombination varies substantially within the genome, among individuals,
between sexes and among different species. It is hypothesized that this variation
plays an important role in genome evolution. In this PhD thesis, I investigated the
extent and molecular basis of recombination variation in adaptively diverging
threespine stickleback fish (Gasterosteus aculeatus) to further understand its
evolutionary implications. I used both ChIP-sequencing and whole genome
sequencing of pedigrees to empirically identify and quantify double strand breaks
(DSBs) and meiotic crossovers (COs). Whole genome sequencing of large nuclear
families was performed to identify meiotic crossovers in 36 individuals of
diverging marine and freshwater ecotypes and their hybrids. This produced the
first genome-wide high-resolution sex-specific and ecotype-specific map of
contemporary recombination events in sticklebacks. The results show striking
differences in crossover number and placement between sexes. Females recombine
nearly 1.76 times more than males and their COs are distributed all over the
chromosome while male COs predominantly occur near the chromosomal
periphery. When compared among ecotypes a significant reduction in overall
recombination rate was observed in hybrid females compared to pure forms. Even
though the known loci underlying marine-freshwater adaptive divergence tend to
fall in regions of low recombination, considerable female recombination is
observed in the regions between adaptive loci. This suggests that the sexual
dimorphism in recombination phenotype may have important evolutionary
implications.
At the fine-scale, COs and male DSBs are nonrandomly distributed
involving âsemi-hotâ hotspots and coldspots of recombination. I report a significant
association of male DSBs and COs with functionally active open chromatin regions
like gene promoters, whereas female COs did not show an association more than
expected by chance. However, a considerable number of COs and DSBs away from
any of the tested open chromatin marks suggests possibility of additional novel
mechanisms of recombination regulation in sticklebacks.
In addition, we developed a novel method for constructing individualized
recombination maps from pooled gamete DNA using linked read sequencing
technology by 10X GenomicsÂź. We tested the method by contrasting recombination
profiles of gametic and somatic tissue from a hybrid mouse and stickleback fish.
Our pipeline faithfully detects previously described recombination hotspots in
mice at high resolution and identify many novel hotspots across the genome in
both species and thereby demonstrate the efficiency of the novel method. This
method could be employed for large scale QTL mapping studies to further
understand the genetic basis of recombination variation reported in this thesis.
By bridging the gap between natural populations and lab organisms with
large clutch sizes and tractable genetic tools, this work shows the utility of the
stickleback system and provides important groundwork for further studies of
heterochiasmy and divergence in recombination during adaptation to differing
environments
Improved quality metrics for association and reproducibility in chromatin accessibility data using mutual information
Abstract Background Correlation metrics are widely utilized in genomics analysis and often implemented with little regard to assumptions of normality, homoscedasticity, and independence of values. This is especially true when comparing values between replicated sequencing experiments that probe chromatin accessibility, such as assays for transposase-accessible chromatin via sequencing (ATAC-seq). Such data can possess several regions across the human genome with little to no sequencing depth and are thus non-normal with a large portion of zero values. Despite distributed use in the epigenomics field, few studies have evaluated and benchmarked how correlation and association statistics behave across ATAC-seq experiments with known differences or the effects of removing specific outliers from the data. Here, we developed a computational simulation of ATAC-seq data to elucidate the behavior of correlation statistics and to compare their accuracy under set conditions of reproducibility. Results Using these simulations, we monitored the behavior of several correlation statistics, including the Pearsonâs R and Spearmanâs Ï coefficients as well as Kendallâs Ï and TopâDown correlation. We also test the behavior of association measures, including the coefficient of determination R 2 , Kendallâs W, and normalized mutual information. Our experiments reveal an insensitivity of most statistics, including Spearmanâs Ï , Kendallâs Ï , and Kendallâs W, to increasing differences between simulated ATAC-seq replicates. The removal of co-zeros (regions lacking mapped sequenced reads) between simulated experiments greatly improves the estimates of correlation and association. After removing co-zeros, the R 2 coefficient and normalized mutual information display the best performance, having a closer one-to-one relationship with the known portion of shared, enhanced loci between simulated replicates. When comparing values between experimental ATAC-seq data using a random forest model, mutual information best predicts ATAC-seq replicate relationships. Conclusions Collectively, this study demonstrates how measures of correlation and association can behave in epigenomics experiments. We provide improved strategies for quantifying relationships in these increasingly prevalent and important chromatin accessibility assays