7 research outputs found
Detecting recombination and its mechanistic association with genomic features via statistical models
Recombination is a powerful weapon in the evolutionary arsenal of retroviruses such as HIV. It enables the production of chimeric variants or recombinants that may confer a selective advantage to the pathogen over the host immune response. Recombinants further accentuate differences in virulence, disease progression and drug resistance mutation patterns already observed in non-recombinant variants of HIV. This thesis describes the development of a rapid genotyper for HIV sequences employing supervised learning algorithms and its application to complex HIV recombinant data, the application of a hierarchical model for detection of recombination hotspots in the HIV-1 genome and the extension of this model enabling estimation of the association between recombination probabilities and covariates of interest.
The rapid genotyper for HIV-1 explores a solution to the genotyping problem in the machine learning paradigm. Of the algorithms tested, the genotyper built using Bayesian additive regression trees (BART) was most successful in efficiently classifying complex recombinants that pose a challenge to other currently available genotyping methods. We also developed a novel method, bootSMOTE, for generating synthetic data in order to supplement insufficient training data. We found that supplementation with synthetic recombinants especially boosts identification of complex recombinants. We describe the genotyper software available for download as well as a web interface enabling rapid
classiffication of HIV-1 sequences.
Hotspots for recombination in the HIV-1 genome are modeled using spatially smoothed changepoint processes. This hierarchical model uses a phylogenetic recombination detection model of dual changepoint processes at the lower level. The upper level applies a Gaussian Markov random eld (GMRF) hyperprior to population-level recombination probabilities in order to efficiently combine the information from many individual recombination events as inferred at the lower level. Focusing on 544 unique recombinant sequences, we found a novel hotspot in the pol gene of HIV-1 while confirming the presence of a high recombination activity in the env gene.
Valuable insights into the molecular mechanism of recombination may be gained by extending the GMRF model to include covariates of interest. We add a level to the hierarchical model and allow for the simultaneous inference of recombination probabilities as well their association with genomic covariates of interest. Using a set of 527 unique recombinants, we confirmed the presence of the pol hotspot. Interestingly, we found significant positive associations of spatial fluctuations in recombination probabilities with genomic regions prone to forming secondary structure as well as significant negative associations with regions that support tight RNA-DNA hybrid formation. Overall, our results support the theory that pause sites along the genome promote recombination
Detecting recombination and its mechanistic association with genomic features via statistical models
Recombination is a powerful weapon in the evolutionary arsenal of retroviruses such as HIV. It enables the production of chimeric variants or recombinants that may confer a selective advantage to the pathogen over the host immune response. Recombinants further accentuate differences in virulence, disease progression and drug resistance mutation patterns already observed in non-recombinant variants of HIV. This thesis describes the development of a rapid genotyper for HIV sequences employing supervised learning algorithms and its application to complex HIV recombinant data, the application of a hierarchical model for detection of recombination hotspots in the HIV-1 genome and the extension of this model enabling estimation of the association between recombination probabilities and covariates of interest.
The rapid genotyper for HIV-1 explores a solution to the genotyping problem in the machine learning paradigm. Of the algorithms tested, the genotyper built using Bayesian additive regression trees (BART) was most successful in efficiently classifying complex recombinants that pose a challenge to other currently available genotyping methods. We also developed a novel method, bootSMOTE, for generating synthetic data in order to supplement insufficient training data. We found that supplementation with synthetic recombinants especially boosts identification of complex recombinants. We describe the genotyper software available for download as well as a web interface enabling rapid
classiffication of HIV-1 sequences.
Hotspots for recombination in the HIV-1 genome are modeled using spatially smoothed changepoint processes. This hierarchical model uses a phylogenetic recombination detection model of dual changepoint processes at the lower level. The upper level applies a Gaussian Markov random eld (GMRF) hyperprior to population-level recombination probabilities in order to efficiently combine the information from many individual recombination events as inferred at the lower level. Focusing on 544 unique recombinant sequences, we found a novel hotspot in the pol gene of HIV-1 while confirming the presence of a high recombination activity in the env gene.
Valuable insights into the molecular mechanism of recombination may be gained by extending the GMRF model to include covariates of interest. We add a level to the hierarchical model and allow for the simultaneous inference of recombination probabilities as well their association with genomic covariates of interest. Using a set of 527 unique recombinants, we confirmed the presence of the pol hotspot. Interestingly, we found significant positive associations of spatial fluctuations in recombination probabilities with genomic regions prone to forming secondary structure as well as significant negative associations with regions that support tight RNA-DNA hybrid formation. Overall, our results support the theory that pause sites along the genome promote recombination.</p
Recommended from our members
Eri1 regulates microRNA homeostasis and mouse lymphocyte development and antiviral function.
Natural killer (NK) cells play a critical role in early host defense to infected and transformed cells. Here, we show that mice deficient in Eri1, a conserved 3'-to-5' exoribonuclease that represses RNA interference, have a cell-intrinsic defect in NK-cell development and maturation. Eri1(-/-) NK cells displayed delayed acquisition of Ly49 receptors in the bone marrow (BM) and a selective reduction in Ly49D and Ly49H activating receptors in the periphery. Eri1 was required for immune-mediated control of mouse CMV (MCMV) infection. Ly49H(+) NK cells deficient in Eri1 failed to expand efficiently during MCMV infection, and virus-specific responses were also diminished among Eri1(-/-) T cells. We identified miRNAs as the major endogenous small RNA target of Eri1 in mouse lymphocytes. Both NK and T cells deficient in Eri1 displayed a global, sequence-independent increase in miRNA abundance. Ectopic Eri1 expression rescued defective miRNA expression in mature Eri1(-/-) T cells. Thus, mouse Eri1 regulates miRNA homeostasis in lymphocytes and is required for normal NK-cell development and antiviral immunity
CHAPTER 21: Targeting of the rice transcriptome by TAL effectors of Xanthomonas oryzae
International audienc
Eri1 regulates microRNA homeostasis and mouse lymphocyte development and antiviral function
Recommended from our members
Two new complete genome sequences offer insight into host and tissue specificity of plant pathogenic Xanthomonas spp.
Xanthomonas is a large genus of bacteria that collectively cause disease on more than 300 plant species. The broad host range of the genus contrasts with stringent host and tissue specificity for individual species and pathovars. Whole-genome sequences of Xanthomonas campestris pv. raphani strain 756C and X. oryzae pv. oryzicola strain BLS256, pathogens that infect the mesophyll tissue of the leading models for plant biology, Arabidopsis thaliana and rice, respectively, were determined and provided insight into the genetic determinants of host and tissue specificity. Comparisons were made with genomes of closely related strains that infect the vascular tissue of the same hosts and across a larger collection of complete Xanthomonas genomes. The results suggest a model in which complex sets of adaptations at the level of gene content account for host specificity and subtler adaptations at the level of amino acid or noncoding regulatory nucleotide sequence determine tissue specificity