71 research outputs found
Further Steps in TANGO: improved taxonomic assignment in metagenomics
Abstract
Motivation: TANGO is one of the most accurate tools for the taxonomic assignment of sequence reads. However, because of the differences in the taxonomy structures, performing a taxonomic assignment on different reference taxonomies will produce divergent results.
Results: We have improved the TANGO pipeline to be able to perform the taxonomic assignment of a metagenomic sample using alternative reference taxonomies, coming from different sources. We highlight the novel pre-processing step, necessary to accomplish this task, and describe the improvements in the assignment process. We present the new TANGO pipeline in details, and, finally, we show its performance on four real metagenomic datasets and also on synthetic datasets.
Availability: The new version of TANGO, including implementation improvements and novel developments to perform the assignment on different reference taxonomies, is freely available at http://sourceforge.net/projects/taxoassignment/.
Contact: [email protected]
Genome Sequences of Two Tunisian Field Strains of Avian <I>Mycoplasma, M. meleagridis<I> and <I>M. gallinarum<I>
International audienceMycoplasma meleagridis and Mycoplasma gallinarum are bacteria that affect birds, but little is known about the genetic basis of their interaction with chickens and other poultry. Here, we sequenced the genomes of M. meleagridis strain MM_26B8_IPT and M. gallinarum strain Mgn_IPT, both isolated from chickens showing respiratory symptoms, poor growth, reduction in hatchability, and loss of production
Comparative genomic and proteomic analyses of two Mycoplasma agalactiae strains: clues to the macro- and micro-events that are shaping mycoplasma diversity
Background: While the genomic era is accumulating a tremendous amount of data, the question of how genomics can describe a bacterial species remains to be fully addressed. The recent sequencing of the genome of the Mycoplasma agalactiae type strain has challenged our general view on mycoplasmas by suggesting that these simple bacteria are able to exchange significant amount of genetic material via horizontal gene transfer. Yet, events that are shaping mycoplasma genomes and that are underlining diversity within this species have to be fully evaluated. For this purpose, we compared two strains that are representative of the genetic spectrum encountered in this species: the type strain PG2 which genome is already available and a field strain, 5632, which was fully sequenced and annotated in this study. Results: The two genomes differ by ca. 130 kbp with that of 5632 being the largest (1006 kbp). The make up of this additional genetic material mainly corresponds (i) to mobile genetic elements and (ii) to expanded repertoire of gene families that encode putative surface proteins and display features of highly-variable systems. More specifically, three entire copies of a previously described integrative conjugative element are found in 5632 that accounts for ca. 80 kbp. Other mobile genetic elements, found in 5632 but not in PG2, are the more classical insertion sequences which are related to those found in two other ruminant pathogens, M. bovis and M. mycoides subsp. mycoides SC. In 5632, repertoires of gene families encoding surface proteins are larger due to gene duplication. Comparative proteomic analyses of the two strains indicate that the additional coding capacity of 5632 affects the overall architecture of the surface and suggests the occurrence of new phase variable systems based on single nucleotide polymorphisms. Conclusion: Overall, comparative analyses of two M. agalactiae strains revealed a very dynamic genome which structure has been shaped by gene flow among ruminant mycoplasmas and expansion-reduction of gene repertoires encoding surface proteins, the expression of which is driven by localized genetic micro-events
Being Pathogenic, Plastic, and Sexual while Living with a Nearly Minimal Bacterial Genome
Mycoplasmas are commonly described as the simplest self-replicating organisms, whose evolution was mainly characterized by genome downsizing with a proposed evolutionary scenario similar to that of obligate intracellular bacteria such as insect endosymbionts. Thus far, analysis of mycoplasma genomes indicates a low level of horizontal gene transfer (HGT) implying that DNA acquisition is strongly limited in these minimal bacteria. In this study, the genome of the ruminant pathogen Mycoplasma agalactiae was sequenced. Comparative genomic data and phylogenetic tree reconstruction revealed that ∼18% of its small genome (877,438 bp) has undergone HGT with the phylogenetically distinct mycoides cluster, which is composed of significant ruminant pathogens. HGT involves genes often found as clusters, several of which encode lipoproteins that usually play an important role in mycoplasma–host interaction. A decayed form of a conjugative element also described in a member of the mycoides cluster was found in the M. agalactiae genome, suggesting that HGT may have occurred by mobilizing a related genetic element. The possibility of HGT events among other mycoplasmas was evaluated with the available sequenced genomes. Our data indicate marginal levels of HGT among Mycoplasma species except for those described above and, to a lesser extent, for those observed in between the two bird pathogens, M. gallisepticum and M. synoviae. This first description of large-scale HGT among mycoplasmas sharing the same ecological niche challenges the generally accepted evolutionary scenario in which gene loss is the main driving force of mycoplasma evolution. The latter clearly differs from that of other bacteria with small genomes, particularly obligate intracellular bacteria that are isolated within host cells. Consequently, mycoplasmas are not only able to subvert complex hosts but presumably have retained sexual competence, a trait that may prevent them from genome stasis and contribute to adaptation to new hosts
A novel substitution matrix fitted to the compositional bias in Mollicutes improves the prediction of homologous relationships
<p>Abstract</p> <p>Background</p> <p>Substitution matrices are key parameters for the alignment of two protein sequences, and consequently for most comparative genomics studies. The composition of biological sequences can vary importantly between species and groups of species, and classical matrices such as those in the BLOSUM series fail to accurately estimate alignment scores and statistical significance with sequences sharing marked compositional biases.</p> <p>Results</p> <p>We present a general and simple methodology to build matrices that are especially fitted to the compositional bias of proteins. Our approach is inspired from the one used to build the BLOSUM matrices and is based on learning substitution and amino acid frequencies on real sequences with the corresponding compositional bias. We applied it to the large scale comparison of Mollicute AT-rich genomes. The new matrix, MOLLI60, was used to predict pairwise orthology relationships, as well as homolog families among 24 Mollicute genomes. We show that this new matrix enables to better discriminate between true and false orthologs and improves the clustering of homologous proteins, with respect to the use of the classical matrix BLOSUM62.</p> <p>Conclusions</p> <p>We show in this paper that well-fitted matrices can improve the predictions of orthologous and homologous relationships among proteins with a similar compositional bias. With the ever-increasing number of sequenced genomes, our approach could prove valuable in numerous comparative studies focusing on atypical genomes.</p
Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak
<p>Abstract</p> <p>Background</p> <p>The Fagaceae family comprises about 1,000 woody species worldwide. About half belong to the <it>Quercus </it>family. These oaks are often a source of raw material for biomass wood and fiber. Pedunculate and sessile oaks, are among the most important deciduous forest tree species in Europe. Despite their ecological and economical importance, very few genomic resources have yet been generated for these species. Here, we describe the development of an EST catalogue that will support ecosystem genomics studies, where geneticists, ecophysiologists, molecular biologists and ecologists join their efforts for understanding, monitoring and predicting functional genetic diversity.</p> <p>Results</p> <p>We generated 145,827 sequence reads from 20 cDNA libraries using the Sanger method. Unexploitable chromatograms and quality checking lead us to eliminate 19,941 sequences. Finally a total of 125,925 ESTs were retained from 111,361 cDNA clones. Pyrosequencing was also conducted for 14 libraries, generating 1,948,579 reads, from which 370,566 sequences (19.0%) were eliminated, resulting in 1,578,192 sequences. Following clustering and assembly using TGICL pipeline, 1,704,117 EST sequences collapsed into 69,154 tentative contigs and 153,517 singletons, providing 222,671 non-redundant sequences (including alternative transcripts). We also assembled the sequences using MIRA and PartiGene software and compared the three unigene sets. Gene ontology annotation was then assigned to 29,303 unigene elements. Blast search against the SWISS-PROT database revealed putative homologs for 32,810 (14.7%) unigene elements, but more extensive search with Pfam, Refseq_protein, Refseq_RNA and eight gene indices revealed homology for 67.4% of them. The EST catalogue was examined for putative homologs of candidate genes involved in bud phenology, cuticle formation, phenylpropanoids biosynthesis and cell wall formation. Our results suggest a good coverage of genes involved in these traits. Comparative orthologous sequences (COS) with other plant gene models were identified and allow to unravel the oak paleo-history. Simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 52,834 SSRs and 36,411 SNPs. All of these are available through the Oak Contig Browser <url>http://genotoul-contigbrowser.toulouse.inra.fr:9092/Quercus_robur/index.html</url>.</p> <p>Conclusions</p> <p>This genomic resource provides a unique tool to discover genes of interest, study the oak transcriptome, and develop new markers to investigate functional diversity in natural populations.</p
Molecular and phenotypic profiling from base to the crown in maritime pine wood-forming tissue
Research• Environmental, developmental and genetic factors affect variation in wood properties
at the chemical, anatomical and physical levels. Here, the phenotypic variation observed
along the tree stem was explored and the hypothesis tested that this variation could be
the result of the differential expression of genes/proteins during wood formation.
• Differentiating xylem samples of maritime pine (Pinus pinaster) were collected
from the top (crown wood, CW) to the bottom (base wood, BW) of adult trees.
These samples were characterized by Fourier transform infrared spectroscopy (FTIR)
and analytical pyrolysis. Two main groups of samples, corresponding to CW and BW,
could be distinguished from cell wall chemical composition.
• A genomic approach, combining large-scale production of expressed sequence
tags (ESTs), gene expression profiling and quantitative proteomics analysis, allowed
identification of 262 unigenes (out of 3512) and 231 proteins (out of 1372 spots)
that were differentially expressed along the stem.
• A good relationship was found between functional categories from transcriptomic
and proteomic data. A good fit between the molecular mechanisms involved in
CW–BW formation and these two types of wood phenotypic differences was also
observed. This work provides a list of candidate genes for wood properties that will
be tested in forward genetic
Life on Arginine for Mycoplasma hominis: Clues from Its Minimal Genome and Comparison with Other Human Urogenital Mycoplasmas
Mycoplasma hominis is an opportunistic human mycoplasma. Two other pathogenic human species, M. genitalium and Ureaplasma parvum, reside within the same natural niche as M. hominis: the urogenital tract. These three species have overlapping, but distinct, pathogenic roles. They have minimal genomes and, thus, reduced metabolic capabilities characterized by distinct energy-generating pathways. Analysis of the M. hominis PG21 genome sequence revealed that it is the second smallest genome among self-replicating free living organisms (665,445 bp, 537 coding sequences (CDSs)). Five clusters of genes were predicted to have undergone horizontal gene transfer (HGT) between M. hominis and the phylogenetically distant U. parvum species. We reconstructed M. hominis metabolic pathways from the predicted genes, with particular emphasis on energy-generating pathways. The Embden–Meyerhoff–Parnas pathway was incomplete, with a single enzyme absent. We identified the three proteins constituting the arginine dihydrolase pathway. This pathway was found essential to promote growth in vivo. The predicted presence of dimethylarginine dimethylaminohydrolase suggested that arginine catabolism is more complex than initially described. This enzyme may have been acquired by HGT from non-mollicute bacteria. Comparison of the three minimal mollicute genomes showed that 247 CDSs were common to all three genomes, whereas 220 CDSs were specific to M. hominis, 172 CDSs were specific to M. genitalium, and 280 CDSs were specific to U. parvum. Within these species-specific genes, two major sets of genes could be identified: one including genes involved in various energy-generating pathways, depending on the energy source used (glucose, urea, or arginine) and another involved in cytadherence and virulence. Therefore, a minimal mycoplasma cell, not including cytadherence and virulence-related genes, could be envisaged containing a core genome (247 genes), plus a set of genes required for providing energy. For M. hominis, this set would include 247+9 genes, resulting in a theoretical minimal genome of 256 genes
MolliGen, a database dedicated to the comparative genomics of Mollicutes
Bacteria belonging to the class Mollicutes were among the first ones to be selected for complete genome sequencing because of the minimal size of their genomes and their pathogenicity for humans and a broad range of animals and plants. At this time six genome sequences have been publicly released (Mycoplasma genitalium, Mycoplasma pneumoniae, Ureaplasma urealyticum-parvum, Mycoplasma pulmonis, Mycoplasma penetrans and Mycoplasma gallisepticum) and as the number of available mollicute genomes increases, comparative genomics analysis within this model group of organisms becomes more and more instructive. However, such an analysis is difficult to carry out without a suitable platform gathering not only the original annotations but also relevant information available in public databases or obtained by applying common bioinformatics methods. With the aim of solving these difficulties, we have developed a web-accessible database named MolliGen (http://cbi.labri.fr/outils/molligen/). After selecting a set of genomes the user can launch various types of search based on annotation, position on the chromosomes or sequence similarity. In addition, relationships of putative orthology have been precomputed to allow differential genome queries. The results are presented in table format with multiple links to public databases and to bioinformatic analyses such as multiple alignments or BLAST search. Specific tools were also developed for the graphical visualization of the results, including a multi- genome browser for displaying dynamic pictures with clickable objects and for viewing relationships of precomputed similarity. MolliGen is designed to integrate all the complete genomes of mollicutes as they become available
- …