61 research outputs found

    A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLOSUM series fail to accurately estimate alignment scores and statistical significance with sequences sharing marked compositional biases.</p> <p>Results</p> <p>We present a general and simple methodology to build matrices that are especially fitted to the compositional bias of proteins. Our approach is inspired from the one used to build the BLOSUM matrices and is based on learning substitution and amino acid frequencies on real sequences with the corresponding compositional bias. We applied it to the large scale comparison of Mollicute AT-rich genomes. The new matrix, MOLLI60, was used to predict pairwise orthology relationships, as well as homolog families among 24 Mollicute genomes. We show that this new matrix enables to better discriminate between true and false orthologs and improves the clustering of homologous proteins, with respect to the use of the classical matrix BLOSUM62.</p> <p>Conclusions</p> <p>We show in this paper that well-fitted matrices can improve the predictions of orthologous and homologous relationships among proteins with a similar compositional bias. With the ever-increasing number of sequenced genomes, our approach could prove valuable in numerous comparative studies focusing on atypical genomes.</p

    Specific Evolution of F1-Like ATPases in Mycoplasmas

    Get PDF
    F1F0 ATPases have been identified in most bacteria, including mycoplasmas which have very small genomes associated with a host-dependent lifestyle. In addition to the typical operon of eight genes encoding genuine F1F0 ATPase (Type 1), we identified related clusters of seven genes in many mycoplasma species. Four of the encoded proteins have predicted structures similar to the α, β, γ and ε subunits of F1 ATPases and could form an F1-like ATPase. The other three proteins display no similarity to any other known proteins. Two of these proteins are probably located in the membrane, as they have three and twelve predicted transmembrane helices. Phylogenomic studies identified two types of F1-like ATPase clusters, Type 2 and Type 3, characterized by a rapid evolution of sequences with the conservation of structural features. Clusters encoding Type 2 and Type 3 ATPases were assumed to originate from the Hominis group of mycoplasmas. We suggest that Type 3 ATPase clusters may spread to other phylogenetic groups by horizontal gene transfer between mycoplasmas in the same host, based on phylogeny and genomic context. Functional analyses in the ruminant pathogen Mycoplasma mycoides subsp. mycoides showed that the Type 3 cluster genes were organized into an operon. Proteomic analyses demonstrated that the seven encoded proteins were produced during growth in axenic media. Mutagenesis and complementation studies demonstrated an association of the Type 3 cluster with a major ATPase activity of membrane fractions. Thus, despite their tendency toward genome reduction, mycoplasmas have evolved and exchanged specific F1-like ATPases with no known equivalent in other bacteria. We propose a model, in which the F1-like structure is associated with a hypothetical X0 sector located in the membrane of mycoplasma cells

    Genome-Scale Analysis of Mycoplasma agalactiae Loci Involved in Interaction with Host Cells

    Get PDF
    Mycoplasma agalactiae is an important pathogen of small ruminants, in which it causes contagious agalactia. It belongs to a large group of “minimal bacteria” with a small genome and reduced metabolic capacities that are dependent on their host for nutrients. Mycoplasma survival thus relies on intimate contact with host cells, but little is known about the factors involved in these interactions or in the more general infectious process. To address this issue, an assay based on goat epithelial and fibroblastic cells was used to screen a M. agalactiae knockout mutant library. Mutants with reduced growth capacities in cell culture were selected and 62 genomic loci were identified as contributing to this phenotype. As expected for minimal bacteria, “transport and metabolism” was the functional category most commonly implicated in this phenotype, but 50% of the selected mutants were disrupted in coding sequences (CDSs) with unknown functions, with surface lipoproteins being most commonly represented in this category. Since mycoplasmas lack a cell wall, lipoproteins are likely to be important in interactions with the host. A few intergenic regions were also identified that may act as regulatory sequences under co-culture conditions. Interestingly, some mutants mapped to gene clusters that are highly conserved across mycoplasma species but located in different positions. One of these clusters was found in a transcriptionally active region of the M. agalactiae chromosome, downstream of a cryptic promoter. A possible scenario for the evolution of these loci is discussed. Finally, several CDSs identified here are conserved in other important pathogenic mycoplasmas, and some were involved in horizontal gene transfer with phylogenetically distant species. These results provide a basis for further deciphering functions mediating mycoplasma-host interactions

    Origination of the Split Structure of Spliceosomal Genes from Random Genetic Sequences

    Get PDF
    The mechanism by which protein-coding portions of eukaryotic genes came to be separated by long non-coding stretches of DNA, and the purpose for this perplexing arrangement, have remained unresolved fundamental biological problems for three decades. We report here a plausible solution to this problem based on analysis of open reading frame (ORF) length constraints in the genomes of nine diverse species. If primordial nucleic acid sequences were random in sequence, functional proteins that are innately long would not be encoded due to the frequent occurrence of stop codons. The best possible way that a long protein-coding sequence could have been derived was by evolving a split-structure from the random DNA (or RNA) sequence. Results of the systematic analyses of nine complete genome sequences presented here suggests that perhaps the major underlying structural features of split-genes have evolved due to the indigenous occurrence of split protein-coding genes in primordial random nucleotide sequence. The results also suggest that intron-rich genes containing short exons may have been the original form of genes intrinsically occurring in random DNA, and that intron-poor genes containing long exons were perhaps derived from the original intron-rich genes

    Life on Arginine for Mycoplasma hominis: Clues from Its Minimal Genome and Comparison with Other Human Urogenital Mycoplasmas

    Get PDF
    Mycoplasma hominis is an opportunistic human mycoplasma. Two other pathogenic human species, M. genitalium and Ureaplasma parvum, reside within the same natural niche as M. hominis: the urogenital tract. These three species have overlapping, but distinct, pathogenic roles. They have minimal genomes and, thus, reduced metabolic capabilities characterized by distinct energy-generating pathways. Analysis of the M. hominis PG21 genome sequence revealed that it is the second smallest genome among self-replicating free living organisms (665,445 bp, 537 coding sequences (CDSs)). Five clusters of genes were predicted to have undergone horizontal gene transfer (HGT) between M. hominis and the phylogenetically distant U. parvum species. We reconstructed M. hominis metabolic pathways from the predicted genes, with particular emphasis on energy-generating pathways. The Embden–Meyerhoff–Parnas pathway was incomplete, with a single enzyme absent. We identified the three proteins constituting the arginine dihydrolase pathway. This pathway was found essential to promote growth in vivo. The predicted presence of dimethylarginine dimethylaminohydrolase suggested that arginine catabolism is more complex than initially described. This enzyme may have been acquired by HGT from non-mollicute bacteria. Comparison of the three minimal mollicute genomes showed that 247 CDSs were common to all three genomes, whereas 220 CDSs were specific to M. hominis, 172 CDSs were specific to M. genitalium, and 280 CDSs were specific to U. parvum. Within these species-specific genes, two major sets of genes could be identified: one including genes involved in various energy-generating pathways, depending on the energy source used (glucose, urea, or arginine) and another involved in cytadherence and virulence. Therefore, a minimal mycoplasma cell, not including cytadherence and virulence-related genes, could be envisaged containing a core genome (247 genes), plus a set of genes required for providing energy. For M. hominis, this set would include 247+9 genes, resulting in a theoretical minimal genome of 256 genes

    Comparative Genomics of Mycoplasma: Analysis of Conserved Essential Genes and Diversity of the Pan-Genome

    Get PDF
    Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages

    Proteomics Characterization of Cytoplasmic and Lipid-Associated Membrane Proteins of Human Pathogen Mycoplasma fermentans M64

    Get PDF
    Mycoplasma fermentans is a potent human pathogen which has been implicated in several diseases. Notably, its lipid-associated membrane proteins (LAMPs) play a role in immunomodulation and development of infection-associated inflammatory diseases. However, the systematic protein identification of pathogenic M. fermentans has not been reported. From our recent sequencing results of M. fermentans M64 isolated from human respiratory tract, its genome is around 1.1 Mb and encodes 1050 predicted protein-coding genes. In the present study, soluble proteome of M. fermentans was resolved and analyzed using two-dimensional gel electrophoresis. In addition, Triton X-114 extraction was carried out to enrich amphiphilic proteins including putative lipoproteins and membrane proteins. Subsequent mass spectrometric analyses of these proteins had identified a total of 181 M. fermentans ORFs. Further bioinformatics analysis of these ORFs encoding proteins with known or so far unknown orthologues among bacteria revealed that a total of 131 proteins are homologous to known proteins, 11 proteins are conserved hypothetical proteins, and the remaining 39 proteins are likely M. fermentans-specific proteins. Moreover, Triton X-114-enriched fraction was shown to activate NF-kB activity of raw264.7 macrophage and a total of 21 lipoproteins with predicted signal peptide were identified therefrom. Together, our work provides the first proteome reference map of M. fermentans as well as several putative virulence-associated proteins as diagnostic markers or vaccine candidates for further functional study of this human pathogen

    Complexity of the Mycoplasma fermentans M64 Genome and Metabolic Essentiality and Diversity among Mycoplasmas

    Get PDF
    Recently, the genomes of two Mycoplasma fermentans strains, namely M64 and JER, have been completely sequenced. Gross comparison indicated that the genome of M64 is significantly bigger than the other strain and the difference is mainly contributed by the repetitive sequences including seven families of simple and complex transposable elements ranging from 973 to 23,778 bps. Analysis of these repeats resulted in the identification of a new distinct family of Integrative Conjugal Elements of M. fermentans, designated as ICEF-III. Using the concept of “reaction connectivity”, the metabolic capabilities in M. fermentans manifested by the complete and partial connected biomodules were revealed. A comparison of the reported M. pulmonis, M. arthritidis, M. genitalium, B. subtilis, and E. coli essential genes and the genes predicted from the M64 genome indicated that more than 73% of the Mycoplasmas essential genes are preserved in M. fermentans. Further examination of the highly and partly connected reactions by a novel combinatorial phylogenetic tree, metabolic network, and essential gene analysis indicated that some of the pathways (e.g. purine and pyrimidine metabolisms) with partial connected reactions may be important for the conversions of intermediate metabolites. Taken together, in light of systems and network analyses, the diversity among the Mycoplasma species was manifested on the variations of their limited metabolic abilities during evolution
    corecore