54 research outputs found

    Evidence of animal mtDNA recombination between divergent populations of the potato cyst nematode Globodera pallida

    Get PDF
    Recombination is typically assumed to be absent in animal mitochondrial genomes (mtDNA). However, the maternal mode of inheritance means that recombinant products are indistinguishable from their progenitor molecules. The majority of studies of mtDNA recombination assess past recombination events, where patterns of recombination are inferred by comparing the mtDNA of different individuals. Few studies assess contemporary mtDNA recombination, where recombinant molecules are observed as direct mosaics of known progenitor molecules. Here we use the potato cyst nematode, Globodera pallida, to investigate past and contemporary recombination. Past recombination was assessed within and between populations of G. pallida, and contemporary recombination was assessed in the progeny of experimental crosses of these populations. Breeding of genetically divergent organisms may cause paternal mtDNA leakage, resulting in heteroplasmy and facilitating the detection of recombination. To assess contemporary recombination we looked for evidence of recombination between the mtDNA of the parental populations within the mtDNA of progeny. Past recombination was detected between a South American population and several UK populations of G. pallida, as well as between two South American populations. This suggests that these populations may have interbred, paternal mtDNA leakage occurred, and the mtDNA of these populations subsequently recombined. This evidence challenges two dogmas of animal mtDNA evolution; no recombination and maternal inheritance. No contemporary recombination between the parental populations was detected in the progeny of the experimental crosses. This supports current arguments that mtDNA recombination events are rare. More sensitive detection methods may be required to adequately assess contemporary mtDNA recombination in animals

    OrgConv: detection of gene conversion using consensus sequences and its application in plant mitochondrial and chloroplast homologs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ancestry of mitochondria and chloroplasts traces back to separate endosymbioses of once free-living bacteria. The highly reduced genomes of these two organelles therefore contain very distant homologs that only recently have been shown to recombine inside the mitochondrial genome. Detection of gene conversion between mitochondrial and chloroplast homologs was previously impossible due to the lack of suitable computer programs. Recently, I developed a novel method and have, for the first time, discovered recurrent gene conversion between chloroplast mitochondrial genes. The method will further our understanding of plant organellar genome evolution and help identify and remove gene regions with incongruent phylogenetic signals for several genes widely used in plant systematics. Here, I implement such a method that is available in a user friendly web interface.</p> <p>Results</p> <p><monospace>OrgConv</monospace> (<b>Org</b>anellar <b>Conv</b>ersion) is a computer package developed for detection of gene conversion between mitochondrial and chloroplast homologous genes. <monospace>OrgConv</monospace> is available in two forms; source code can be installed and run on a Linux platform and a web interface is available on multiple operating systems. The input files of the feature program are two multiple sequence alignments from different organellar compartments in FASTA format. The program compares every examined sequence against the consensus sequence of each sequence alignment rather than exhaustively examining every possible combination. Making use of consensus sequences significantly reduces the number of comparisons and therefore reduces overall computational time, which allows for analysis of very large datasets. Most importantly, with the significantly reduced number of comparisons, the statistical power remains high in the face of correction for multiple tests.</p> <p>Conclusions</p> <p>Both the source code and the web interface of <monospace>OrgConv</monospace> are available for free from the <monospace>OrgConv</monospace> website <url>http://www.indiana.edu/~orgconv</url>. Although <monospace>OrgConv</monospace> has been developed with main focus on detection of gene conversion between mitochondrial and chloroplast genes, it may also be used for detection of gene conversion between any two distinct groups of homologous sequences.</p

    Distinct repeat motifs at the C-terminal region of CagA of Helicobacter pylori strains isolated from diseased patients and asymptomatic individuals in West Bengal, India

    Get PDF
    Background: Infection with Helicobacter pylori strains that express CagA is associated with gastritis, peptic ulcer disease, and gastric adenocarcinoma. The biological function of CagA depends on tyrosine phosphorylation by a cellular kinase. The phosphate acceptor tyrosine moiety is present within the EPIYA motif at the C-terminal region of the protein. This region is highly polymorphic due to variations in the number of EPIYA motifs and the polymorphism found in spacer regions among EPIYA motifs. The aim of this study was to analyze the polymorphism at the C-terminal end of CagA and to evaluate its association with the clinical status of the host in West Bengal, India. Results: Seventy-seven H. pylori strains isolated from patients with various clinical statuses were used to characterize the C-ternimal polymorphic region of CagA. Our analysis showed that there is no correlation between the previously described CagA types and various disease outcomes in Indian context. Further analyses of different CagA structures revealed that the repeat units in the spacer sequences within the EPIYA motifs are actually more discrete than the previously proposed models of CagA variants. Conclusion: Our analyses suggest that EPIYA motifs as well as the spacer sequence units are present as distinct insertions and deletions, which possibly have arisen from extensive recombination events. Moreover, we have identified several new CagA types, which could not be typed by the existing systems and therefore, we have proposed a new typing system. We hypothesize that a cagA gene encoding higher number EPIYA motifs may perhaps have arisen from cagA genes that encode lesser EPIYA motifs by acquisition of DNA segments through recombination events

    Analysis of Next-generation Sequencing Data in Virology - Opportunities and Challenges

    Get PDF
    Viruses are the most abundant and the smallest organisms, which are relatively simple to sequence. Genome sequence data of viruses for individual species to populations outnumber that of other species. Although this offers an opportunity to study viral diversity at varying levels of taxonomic hierarchy, it also poses challenges for systematic and structured organization of data and its downstream processing. Extensive computational analyses using a number of algorithms and programs have opened exciting opportunities for virus discovery and diagnostics, apart from augmenting our understanding of the intriguing world of viruses. Unravelling evolutionary dynamics of viruses permits improved understanding of phenomena such as quasispecies diversity, role of mutations in host switching and drug resistance, which enables the tangible measurements of genotype and phenotype of viruses. Improved understanding of geno-/serotype diversity in correlation with antigenic diversity will facilitate rational design and development of efficacious vaccines against emerging and re-emerging viruses. Mathematical models developed using the genomic data could be used to predict the spread of viruses due to vector switching and the (re)emergence due to host switching and, thereby, contribute towards designing public health policies for disease management and control

    Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli

    Get PDF
    BACKGROUND: Comparisons of complete bacterial genomes reveal evidence of lateral transfer of DNA across otherwise clonally diverging lineages. Some lateral transfer events result in acquisition of novel genomic segments and are easily detected through genome comparison. Other more subtle lateral transfers involve homologous recombination events that result in substitution of alleles within conserved genomic regions. This type of event is observed infrequently among distantly related organisms. It is reported to be more common within species, but the frequency has been difficult to quantify since the sequences under comparison tend to have relatively few polymorphic sites. RESULTS: Here we report a genome-wide assessment of homologous recombination among a collection of six complete Escherichia coli and Shigella flexneri genome sequences. We construct a whole-genome multiple alignment and identify clusters of polymorphic sites that exhibit atypical patterns of nucleotide substitution using a random walk-based method. The analysis reveals one large segment (approximately 100 kb) and 186 smaller clusters of single base pair differences that suggest lateral exchange between lineages. These clusters include portions of 10% of the 3,100 genes conserved in six genomes. Statistical analysis of the functional roles of these genes reveals that several classes of genes are over-represented, including those involved in recombination, transport and motility. CONCLUSION: We demonstrate that intraspecific recombination in E. coli is much more common than previously appreciated and may show a bias for certain types of genes. The described method provides high-specificity, conservative inference of past recombination events

    Improved Bayesian methods for detecting recombination and rate heterogeneity in DNA sequence alignments

    Get PDF
    DNA sequence alignments are usually not homogeneous. Mosaic structures may result as a consequence of recombination or rate heterogeneity. Interspecific recombination, in which DNA subsequences are transferred between different (typically viral or bacterial) strains may result in a change of the topology of the underlying phylogenetic tree. Rate heterogeneity corresponds to a change of the nucleotide substitution rate. Various methods for simultaneously detecting recombination and rate heterogeneity in DNA sequence alignments have recently been proposed, based on complex probabilistic models that combine phylogenetic trees with factorial hidden Markov models or multiple changepoint processes. The objective of my thesis is to identify potential shortcomings of these models and explore ways of how to improve them. One shortcoming that I have identified is related to an approximation made in various recently proposed Bayesian models. The Bayesian paradigm requires the solution of an integral over the space of parameters. To render this integration analytically tractable, these models assume that the vectors of branch lengths of the phylogenetic tree are independent among sites. While this approximation reduces the computational complexity considerably, I show that it leads to the systematic prediction of spurious topology changes in the Felsenstein zone, that is, the area in the branch lengths configuration space where maximum parsimony consistently infers the wrong topology due to long-branch attraction. I demonstrate these failures by using two Bayesian hypothesis tests, based on an inter- and an intra-model approach to estimating the marginal likelihood. I then propose a revised model that addresses these shortcomings, and demonstrate its improved performance on a set of synthetic DNA sequence alignments systematically generated around the Felsenstein zone. The core model explored in my thesis is a phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to recombination and rate heterogeneity. The focus of my work is on improving the modelling of the latter aspect. Earlier research efforts by other authors have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. Their work fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. I have improved these earlier phylogenetic FHMMs in two respects. Firstly, by sampling the rate vector from the posterior distribution with RJMCMC I have made the modelling of regional rate heterogeneity more flexible, and I infer the number of different degrees of divergence directly from the DNA sequence alignment, thereby dispensing with the need to arbitrarily select this quantity in advance. Secondly, I explicitly model within-codon rate heterogeneity via a separate rate modification vector. In this way, the within-codon effect of rate heterogeneity is imposed on the model a priori, which facilitates the learning of the biologically more interesting effect of regional rate heterogeneity a posteriori. I have carried out simulations on synthetic DNA sequence alignments, which have borne out my conjecture. The existing model, which does not explicitly include the within-codon rate variation, has to model both effects with the same modelling mechanism. As expected, it was found to fail to disentangle these two effects. On the contrary, I have found that my new model clearly separates within-codon rate variation from regional rate heterogeneity, resulting in more accurate predictions

    Detecting Phylogenetic Breakpoints and Discordance from Genome-Wide Alignments for Species Tree Reconstruction

    Get PDF
    With the easy acquisition of sequence data, it is now possible to obtain and align whole genomes across multiple related species or populations. In this work, I assess the performance of a statistical method to reconstruct the whole distribution of phylogenetic trees along the genome, estimate the proportion of the genome for which a given clade is true, and infer a concordance tree that summarizes the dominant vertical inheritance pattern. There are two main issues when dealing with whole-genome alignments, as opposed to multiple genes: the size of the data and the detection of recombination breakpoints. These breakpoints partition the genomic alignment into phylogenetically homogeneous loci, where sites within a given locus all share the same phylogenetic tree topology. To delimitate these loci, I describe here a method based on the minimum description length (MDL) principle, implemented with dynamic programming for computational efficiency. Simulations show that combining MDL partitioning with Bayesian concordance analysis provides an efficient and robust way to estimate both the vertical inheritance signal and the horizontal phylogenetic signal. The method performed well both in the presence of incomplete lineage sorting and in the presence of horizontal gene transfer. A high level of systematic bias was found here, highlighting the need for good individual tree building methods, which form the basis for more elaborate gene tree/species tree reconciliation methods

    Comparative genomics and concerted evolution of β-tubulin paralogs in Leishmania spp

    Get PDF
    BACKGROUND: Tubulin isotypes and expression patterns are highly regulated in diverse organisms. The genome sequence of the protozoan parasite Leishmania major contains three distinct β-tubulin loci. To investigate the diversity of β-tubulin genes, we have compared the published genome sequence to draft genome sequences of two further species L. infantum and L. braziliensis. Untranscribed regions and coding sequences for each isoform were compared within and between species in relation to the known diversity of β-tubulin transcripts in Leishmania spp. RESULTS: All three β-tubulin loci were present in L. infantum and L. braziliensis, showing conserved synteny with the L. major sequence, hence confirming that these loci are paralogous. Flanking regions suggested that the chromosome 21 locus is an amastigote-specific isoform and more closely related (either structurally or functionally) to the chromosome 33 'array' locus than the chromosome 8 locus. A phylogenetic network of all isoforms indicated that paralogs from L. braziliensis and L. mexicana were monophyletic, rather than clustering by locus. CONCLUSION: L. braziliensis and L. mexicana sequences appeared more similar to each other than each did to its closest relative in another species; this indicates that these sequences have evolved convergently in each species, perhaps through ectopic gene conversion; a process not yet evident among the more recently derived L. major and L. infantum isoforms. The distinctive non-coding regions of each β-tubulin locus showed that it is the regulatory regions of these loci that have evolved most during the diversification of these genes in Leishmania, while the coding regions have been conserved and concerted. The various loci in Leishmania satisfy a need for innovative expression of β-tubulin, rather than elaboration of its structural role

    Synonymous and Nonsynonymous Distances Help Untangle Convergent Evolution and Recombination

    Full text link
    When estimating a phylogeny from a multiple sequence alignment, researchers often assume the absence of recombination. However, if recombination is present, then tree estimation and all downstream analyses will be impacted, because different segments of the sequence alignment support different phylogenies. Similarly, convergent selective pressures at the molecular level can also lead to phylogenetic tree incongruence across the sequence alignment. Current methods for detection of phylogenetic incongruence are not equipped to distinguish between these two different mechanisms and assume that the incongruence is a result of recombination or other horizontal transfer of genetic information. We propose a new recombination detection method that can make this distinction, based on synonymous codon substitution distances. Although some power is lost by discarding the information contained in the nonsynonymous substitutions, our new method has lower false positive probabilities than the comparable recombination detection method when the phylogenetic incongruence signal is due to convergent evolution. We apply our method to three empirical examples, where we analyze: 1) sequences from a transmission network of the human immunodeficiency virus, 2) tlpB gene sequences from a geographically diverse set of 38 Helicobacter pylori strains, and 3) Hepatitis C virus sequences sampled longitudinally from one patient.Comment: 21 pages, 8 figures, updated abstrac

    High amino acid diversity and positive selection at a putative coral immunity gene (tachylectin-2)

    Get PDF
    BACKGROUND: Genes involved in immune functions, including pathogen recognition and the activation of innate defense pathways, are among the most genetically variable known, and the proteins that they encode are often characterized by high rates of amino acid substitutions, a hallmark of positive selection. The high levels of variation characteristic of immunity genes make them useful tools for conservation genetics. To date, highly variable immunity genes have yet to be found in corals, keystone organisms of the world's most diverse marine ecosystem, the coral reef. Here, we examine variation in and selection on a putative innate immunity gene from Oculina, a coral genus previously used as a model for studies of coral disease and bleaching. RESULTS: In a survey of 244 Oculina alleles, we find high nonsynonymous variation and a signature of positive selection, consistent with a putative role in immunity. Using computational protein structure prediction, we generate a structural model of the Oculina protein that closely matches the known structure of tachylectin-2 from the Japanese horseshoe crab (Tachypleus tridentatus), a protein with demonstrated function in microbial recognition and agglutination. We also demonstrate that at least three other genera of anthozoan cnidarians (Acropora, Montastrea and Nematostella) possess proteins structurally similar to tachylectin-2. CONCLUSIONS: Taken together, the evidence of high amino acid diversity, positive selection and structural correspondence to the horseshoe crab tachylectin-2 suggests that this protein is 1) part of Oculina's innate immunity repertoire, and 2) evolving adaptively, possibly under selective pressure from coral-associated microorganisms. Tachylectin-2 may serve as a candidate locus to screen coral populations for their capacity to respond adaptively to future environmental change
    corecore