5,860 research outputs found

    Parametric inference of recombination in HIV genomes

    Full text link
    Recombination is an important event in the evolution of HIV. It affects the global spread of the pandemic as well as evolutionary escape from host immune response and from drug therapy within single patients. Comprehensive computational methods are needed for detecting recombinant sequences in large databases, and for inferring the parental sequences. We present a hidden Markov model to annotate a query sequence as a recombinant of a given set of aligned sequences. Parametric inference is used to determine all optimal annotations for all parameters of the model. We show that the inferred annotations recover most features of established hand-curated annotations. Thus, parametric analysis of the hidden Markov model is feasible for HIV full-length genomes, and it improves the detection and annotation of recombinant forms. All computational results, reference alignments, and C++ source code are available at http://bio.math.berkeley.edu/recombination/.Comment: 20 pages, 5 figure

    Decomposing the site frequency spectrum: the impact of tree topology on neutrality tests

    Full text link
    We investigate the dependence of the site frequency spectrum (SFS) on the topological structure of genealogical trees. We show that basic population genetic statistics - for instance estimators of θ\theta or neutrality tests such as Tajima's DD - can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima's DD and Fay and Wu's HH depend in a direct way on a peculiar measure of tree balance which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu's HH and discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulae for these extreme values as a function of sample size and number of segregating sites.Comment: 23 pages, 8 figure

    Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths

    Get PDF
    The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the ∼18,000 families of orthologous genes, we found ∼2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome

    Genome landscapes and bacteriophage codon usage

    Get PDF
    Across all kingdoms of biological life, protein-coding genes exhibit unequal usage of synonmous codons. Although alternative theories abound, translational selection has been accepted as an important mechanism that shapes the patterns of codon usage in prokaryotes and simple eukaryotes. Here we analyze patterns of codon usage across 74 diverse bacteriophages that infect E. coli, P. aeruginosa and L. lactis as their primary host. We introduce the concept of a `genome landscape,' which helps reveal non-trivial, long-range patterns in codon usage across a genome. We develop a series of randomization tests that allow us to interrogate the significance of one aspect of codon usage, such a GC content, while controlling for another aspect, such as adaptation to host-preferred codons. We find that 33 phage genomes exhibit highly non-random patterns in their GC3-content, use of host-preferred codons, or both. We show that the head and tail proteins of these phages exhibit significant bias towards host-preferred codons, relative to the non-structural phage proteins. Our results support the hypothesis of translational selection on viral genes for host-preferred codons, over a broad range of bacteriophages.Comment: 9 Color Figures, 5 Tables, 53 Reference

    Comparative genomics of rumen methanogens : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biochemistry at Massey University, Palmerston North, New Zealand

    Get PDF
    Methane (CH4) emissions from agriculture represent around 9% of global anthropogenic greenhouse gas emissions. The single largest source of this CH4 is animal enteric fermentation, predominantly from ruminant livestock, where it is produced mainly in their fermentative forestomach (or reticulo-rumen) by a group of archaea known as methanogens. In order to reduce CH4 emissions from ruminants, it is necessary to understand the role of methanogenic archaea in the rumen, and to identify their distinguishing characteristics that can be used to develop CH4 mitigation technologies. To gain insights into the role of methanogens in the rumen environment, two methanogens have been isolated from ovine rumen and their genomes were sequenced: methanogenic archaeon ISO4-H5 of the order Methanomassiliicoccales and Methanobrevibacter sp. D5 of Methanobrevibacter gottschalkii clade. Genomic analysis suggests ISO4-H5 is an obligate hydrogen-dependent methylotrophic methanogen, able to use methanol and methylamines as substrates for methanogenesis. Like other organisms within this order, ISO4-H5 does not possess genes required for the first six steps of hydrogenotrophic methanogenesis. Comparison between the genomes of different members of the order Methanomassiliicoccales revealed strong conservation in energy metabolism, particularly in genes of the methylotrophic methanogenesis pathway, as well as in the biosynthesis and use of pyrrolysine. Unlike members of Methanomassiliicoccales from human sources, ISO4-H5 does not contain the genes required for production of coenzyme M (CoM), and requires external supply of CoM to survive. Methanobrevibacter sp. D5 is a hydrogenotrophic methanogen predicted to utilise CO2 + H2 and formate as substrates. Comparisons between the available Methanobrevibacter genomes has revealed a high conservation in energy metabolism and characteristics specific to each clade. The coexistence of different Methanobrevibacter species in the rumen may be partly due to the physical association Methanobrevibacter species with different microorganisms and host surface, which allow unique niches to be established

    Foot and Mouth Disease Virus Genome

    Get PDF

    Evolution and diversity of secretome genes in the apicomplexan parasite Theileria annulata

    Get PDF
    <b>BACKGROUND</b>: Little is known about how apicomplexan parasites have evolved to infect different host species and cell types. Theileria annulata and Theileria parva invade and transform bovine leukocytes but each species favours a different host cell lineage. Parasite-encoded proteins secreted from the intracellular macroschizont stage within the leukocyte represent a critical interface between host and pathogen systems. Genome sequencing has revealed that several Theileria-specific gene families encoding secreted proteins are positively selected at the inter-species level, indicating diversification between the species. We extend this analysis to the intra-species level, focusing on allelic diversity of two major secretome families. These families represent a well-characterised group of genes implicated in control of the host cell phenotype and a gene family of unknown function. To gain further insight into their evolution and function, this study investigates whether representative genes of these two families are diversifying or constrained within the T. annulata population. <b>RESULTS</b>: Strong evidence is provided that the sub-telomerically encoded SVSP family and the host-nucleus targeted TashAT family have evolved under contrasting pressures within natural T. annulata populations. SVSP genes were found to possess atypical codon usage and be evolving neutrally, with high levels of nucleotide substitutions and multiple indels. No evidence of geographical sub-structuring of allelic sequences was found. In contrast, TashAT family genes, implicated in control of host cell gene expression, are strongly conserved at the protein level and geographically sub-structured allelic sequences were identified among Tunisian and Turkish isolates. Although different copy numbers of DNA binding motifs were identified in alleles of TashAT proteins, motif periodicity was strongly maintained, implying conserved functional activity of these sites. <b>CONCLUSIONS</b>: This analysis provides evidence that two distinct secretome genes families have evolved under contrasting selective pressures. The data supports current hypotheses regarding the biological role of TashAT family proteins in the management of host cell phenotype that may have evolved to allow adaptation of T. annulata to a specific host cell lineage. We provide new evidence of extensive allelic diversity in representative members of the enigmatic SVSP gene family, which supports a putative role for the encoded products in subversion of the host immune response

    Accuracy and responses of genomic selection on key traits in apple breeding

    Get PDF
    open13siThe application of genomic selection in fruit tree crops is expected to enhance breeding efficiency by increasing prediction accuracy, increasing selection intensity and decreasing generation interval. The objectives of this study were to assess the accuracy of prediction and selection response in commercial apple breeding programmes for key traits. The training population comprised 977 individuals derived from 20 pedigreed full-sib families. Historic phenotypic data were available on 10 traits related to productivity and fruit external appearance and genotypic data for 7829 SNPs obtained with an Illumina 20K SNP array. From these data, a genome-wide prediction model was built and subsequently used to calculate genomic breeding values of five application full-sib families. The application families had genotypes at 364 SNPs from a dedicated 512 SNP array, and these genotypic data were extended to the high-density level by imputation. These five families were phenotyped for 1 year and their phenotypes were compared to the predicted breeding values. Accuracy of genomic prediction across the 10 traits reached a maximum value of 0.5 and had a median value of 0.19. The accuracies were strongly affected by the phenotypic distribution and heritability of traits. In the largest family, significant selection response was observed for traits with high heritability and symmetric phenotypic distribution. Traits that showed non-significant response often had reduced and skewed phenotypic variation or low heritability. Among the five application families the accuracies were uncorrelated to the degree of relatedness to the training population. The results underline the potential of genomic prediction to accelerate breeding progress in outbred fruit tree crops that still need to overcome long generation intervals and extensive phenotyping costs.openMuranty, H.; Troggio, M.; Sadok, I.B.; Mehdi A.R.; Auwerkerken, A.; Banchi, E.; Velasco, R.; Stevanato, P.; Eric van de Weg, W.; Di Guardo, M.; Kumar, S.; Laurens, F.; Bink, M.C.A.M.Muranty, H.; Troggio, M.; Sadok, I. B.; Mehdi, A. R.; Auwerkerken, A.; Banchi, E.; Velasco, R.; Stevanato, Piergiorgio; Eric van de Weg, W.; Di Guardo, M.; Kumar, S.; Laurens, F.; Bink, M. C. A. M

    Homopolymeric tracts represent a general regulatory mechanism in prokaryotes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>While, traditionally, regulation of gene expression can be grouped into transcriptional, translational, and post-translational mechanisms, some mechanisms of rapid genetic variation can also contribute to regulation of gene expression, e.g., phase variation.</p> <p>Results</p> <p>We show here that prokaryotes evolved to include homopolymeric tracts (HTs) within coding genes as a system that allows for efficient gene inactivation. Analyses of 81 bacterial and 18 archaeal genomes showed that poly(A) and poly(T) HTs are overrepresented in these genomes and preferentially located at the 5' end of coding genes. Location of HTs at the 5' end is not driven by a preferential placement of aminoacids encoded by the AAA and TTT codons at the N-terminal of proteins. The <it>inlA </it>gene of the pathogen <it>L. monocytogenes </it>was used as a model to further study the role of HTs in reversible gene inactivation. In a number of <it>L. monocytogenes </it>strains, <it>inlA </it>harbors a 5' poly(A) HT, which regularly shows frameshift mutation leading to expression of a truncated 8 aa InlA protein. Translational fusions of the <it>inlA </it>5' end allowed us to estimate that the frequency of variation in this HT is about 1,000 fold higher than the estimated average point mutation frequency.</p> <p>Conclusions</p> <p>As frameshift mutations in HTs can occur at high frequencies and enable efficient gene inactivation, hypermutable HTs appear to represent a universal system for regulation of gene expression in prokaryotes. Combined with other studies indicating that HTs also enable rapid diversification of both coding and regulatory genetic sequences in eukaryotes, our data suggest that hypermutable HTs represent a general and rapid evolutionary mechanism facilitating adaptation and gene regulation across diverse organisms.</p
    corecore