7 research outputs found

    Phylogenetic inference under recombination using Bayesian stochastic topology selection

    Get PDF
    Motivation: Conventional phylogenetic analysis for characterizing the relatedness between taxa typically assumes that a single relationship exists between species at every site along the genome. This assumption fails to take into account recombination which is a fundamental process for generating diversity and can lead to spurious results. Recombination induces a localized phylogenetic structure which may vary along the genome. Here, we generalize a hidden Markov model (HMM) to infer changes in phylogeny along multiple sequence alignments while accounting for rate heterogeneity; the hidden states refer to the unobserved phylogenic topology underlying the relatedness at a genomic location. The dimensionality of the number of hidden states (topologies) and their structure are random (not known a priori) and are sampled using Markov chain Monte Carlo algorithms. The HMM structure allows us to analytically integrate out over all possible changepoints in topologies as well as all the unknown branch lengths

    Statistical analysis on detecting recombination sites in DNA-beta satellites associated with the old world geminiviruses

    Get PDF
    Although an exchange of genetic information by recombination plays an important role in the evolution of viruses, it is not clear how it generates diversity. {\it Geminiviruses} are plant viruses which have ambisense single-stranded circular DNA genomes and one of the most economically important plant viruses in agricultural production. Small circular single-stranded DNA satellites, termed DNA-β\beta, have recently been found associated with some geminivirus infections. In this paper we analyze a satellite molecule DNA-β\beta of geminiviruses for recombination events using phylogenetic and statistical analysis and we find that one strain from ToLCMaB has a recombination pattern and is possibly recombinant molecule between two strains from two species, PaLCuB-[IN:Chi:05] (major parent) and ToLCB-[IN:CP:04] (minor parent).Comment: 8 figures and 2 tables. To appear in Frontiers in Systems Biolog

    Distinguishing regional from within-codon rate heterogeneity in DNA sequence alignments

    Get PDF
    We present an improved phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to (1) recombination and (2) rate heterogeneity. The focus of the present work is on improving the modelling of the latter aspect. Earlier papers have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. This approach fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. We propose an improved model that explicitly distinguishes between these two effects, and we assess its performance on a set of simulated DNA sequence alignments

    Pharmacogenomics: Overview, Applications, and Recent Developments

    Get PDF
    Pharmacogenomics is defined as the study of genes and how an individual response is affected due to drugs. Pharmacogenomics is an emerging new branch with combination of both pharmacology (the branch of science that deals with study of drugs) as well as genomics (the branch of science that deals with study of genes) for development of effective doses and safe medications tailored according an individual patient genetic makeup. Human Genome Project is one of the crucial projects in which researchers are developing and learning relation in genes and its effect on the body’s response to medications. Difference in genetic makeup provides difference in effectiveness of medication and in future to predict effectiveness of medication for an individual and to study existence of adverse drug reactions. Besides advancement in the field of science and technology till date pharmacogenomics hangs in infancy. There is limited use of pharmacogenomics, but still, novel approaches are under clinical trials. In near future, pharmacogenomics will enable development of tailor-made therapeutics for treating widespread health problems like neurodegenerative, cardiovascular disorders, HIV, cancer, asthma, etc

    Detecting Phylogenetic Breakpoints and Discordance from Genome-Wide Alignments for Species Tree Reconstruction

    Get PDF
    With the easy acquisition of sequence data, it is now possible to obtain and align whole genomes across multiple related species or populations. In this work, I assess the performance of a statistical method to reconstruct the whole distribution of phylogenetic trees along the genome, estimate the proportion of the genome for which a given clade is true, and infer a concordance tree that summarizes the dominant vertical inheritance pattern. There are two main issues when dealing with whole-genome alignments, as opposed to multiple genes: the size of the data and the detection of recombination breakpoints. These breakpoints partition the genomic alignment into phylogenetically homogeneous loci, where sites within a given locus all share the same phylogenetic tree topology. To delimitate these loci, I describe here a method based on the minimum description length (MDL) principle, implemented with dynamic programming for computational efficiency. Simulations show that combining MDL partitioning with Bayesian concordance analysis provides an efficient and robust way to estimate both the vertical inheritance signal and the horizontal phylogenetic signal. The method performed well both in the presence of incomplete lineage sorting and in the presence of horizontal gene transfer. A high level of systematic bias was found here, highlighting the need for good individual tree building methods, which form the basis for more elaborate gene tree/species tree reconciliation methods

    Classification of phylogenetic data via Bayesian mixture modelling

    Get PDF
    Conventional probabilistic models for phylogenetic inference assume that an evolutionary tree,andasinglesetofbranchlengthsandstochasticprocessofDNA evolutionare sufficient to characterise the generating process across an entire DNA alignment. Unfortunately such a simplistic, homogeneous formulation may be a poor description of reality when the data arise from heterogeneous processes. A well-known example is when sites evolve at heterogeneous rates. This thesis is a contribution to the modelling and understanding of heterogeneityin phylogenetic data. Weproposea methodfor the classificationof DNA sites based on Bayesian mixture modelling. Our method not only accounts for heterogeneous data but also identifies the underlying classes and enables their interpretation. We also introduce novel MCMC methodology with the same, or greater, estimation performance than existing algorithms but with lower computational cost. We find that our mixture model can successfully detect evolutionary heterogeneity and demonstrate its direct relevance by applying it to real DNA data. One of these applications is the analysis of sixteen strains of one of the bacterial species that cause Lyme disease. Results from that analysis have helped understanding the evolutionary paths of these bacterial strains and, therefore, the dynamics of the spread of Lyme disease. Our method is discussed in the context of DNA but it may be extendedto othertypesof molecular data. Moreover,the classification scheme thatwe propose is evidence of the breadth of application of mixture modelling and a step forwards in the search for more realistic models of theprocesses that underlie phylogenetic data.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Understanding the evolutionary history of the papillomaviruses

    Get PDF
    This thesis focuses on the evolutionary history of the papillomaviruses (PVs) using phylogenetic approaches. Two aspects have been examined: the first is the level of phylogenetic compatibility among PV genes and the second is determining the ancestral diversification mechanisms of the PVs in order to explain the origin of the observed associations with host species. Bayesian phylogenetic analysis has been used to make evolutionary inferences. The existence of phylogenetic compatibility among genes was examined by estimating constrained and unconstrained phylogenies for pairs of PV genes. The Bayes' factor statistic derived from comparison of the constrained and unconstrained models indicated significant evidence against identical phylogenies between any of the 6 PV genes investigated and may indicate the existence of ancestral recombination events. The formation of new host-virus associations can occur via a process of 'codivergence', where, following host speciation, the ancestral virus association is effectively inherited by the descendant host species; 'prior divergence' of the virus, which results in multiple virus associations with the host; and 'host transfer', in which the virus lineage is transferred between contemporaneous host species. To distinguish between these mechanisms of virus diversification, an approach based on temporal comparisons of host and virus divergence times was devised. Difficulties associated with the direct estimation of PV divergence times led to the incorporation of a biased sampling approach into Bayesian phylogenetic estimation. This allowed for viral divergence events to be biased in favour of codivergence but allowed sampling of times that violate this assumption and therefore indicate either prior divergence or host transfer. Statistical evaluation of the proportion of violations at each viral divergence identified significant evidence of prior divergence events behind many of the observed PV-host associations and one ancestral host transfer event
    corecore