182 research outputs found
Transcript profiling in Candida albicans reveals new cellular functions for the transcriptional repressors CaTup1, CaMig1 and CaNrg1.
The pathogenic fungus, Candida albicans contains homologues of the transcriptional repressors ScTup1, ScMig1 and ScNrg1 found in budding yeast. In Saccharomyces cerevisiae, ScMig1 targets the ScTup1/ScSsn6 complex to the promoters of glucose repressed genes to repress their transcription. ScNrg1 is thought to act in a similar manner at other promoters. We have examined the roles of their homologues in C. albicans by transcript profiling with an array containing 2002 genes, representing about one quarter of the predicted number of open reading frames (ORFs) in C. albicans. The data revealed that CaNrg1 and CaTup1 regulate a different set of C. albicans genes from CaMig1 and CaTup1. This is consistent with the idea that CaMig1 and CaNrg1 target the CaTup1 repressor to specific subsets of C. albicans genes. However, CaMig1 and CaNrg1 repress other C. albicans genes in a CaTup1-independent fashion. The targets of CaMig1 and CaNrg1 repression, and phenotypic analyses of nrg1/nrg1 and mig1/mig1 mutants, indicate that these factors play differential roles in the regulation of metabolism, cellular morphogenesis and stress responses. Hence, the data provide important information both about the modes of action of these transcriptional regulators and their cellular roles. The transcript profiling data are available at http://www.pasteur.fr/recherche/unites/RIF/transcriptdata/
Proteome sequence features carry signatures of the environmental niche of prokaryotes
<p>Abstract</p> <p>Background</p> <p>Prokaryotic environmental adaptations occur at different levels within cells to ensure the preservation of genome integrity, proper protein folding and function as well as membrane fluidity. Although specific composition and structure of cellular components suitable for the variety of extreme conditions has already been postulated, a systematic study describing such adaptations has not yet been performed. We therefore explored whether the environmental niche of a prokaryote could be deduced from the sequence of its proteome. Finally, we aimed at finding the precise differences between proteome sequences of prokaryotes from different environments.</p> <p>Results</p> <p>We analyzed the proteomes of 192 prokaryotes from different habitats. We collected detailed information about the optimal growth conditions of each microorganism. Furthermore, we selected 42 physico-chemical properties of amino acids and computed their values for each proteome. Further, on the same set of features we applied two fundamentally different machine learning methods, Support Vector Machines and Random Forests, to successfully classify between bacteria and archaea, halophiles and non-halophiles, as well as mesophiles, thermophiles and mesothermophiles. Finally, we performed feature selection by using Random Forests.</p> <p>Conclusions</p> <p>To our knowledge, this is the first time that three different classification cases (domain of life, halophilicity and thermophilicity) of proteome adaptation are successfully performed with the same set of 42 features. The characteristic features of a specific adaptation constitute a signature that may help understanding the mechanisms of adaptation to extreme environments.</p
A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer
We introduce a Markov model for the evolution of a gene family along a
phylogeny. The model includes parameters for the rates of horizontal gene
transfer, gene duplication, and gene loss, in addition to branch lengths in the
phylogeny. The likelihood for the changes in the size of a gene family across
different organisms can be calculated in O(N+hM^2) time and O(N+M^2) space,
where N is the number of organisms, is the height of the phylogeny, and M
is the sum of family sizes. We apply the model to the evolution of gene content
in Preoteobacteria using the gene families in the COG (Clusters of Orthologous
Groups) database
Phylogeny of Prokaryotes and Chloroplasts Revealed by a Simple Composition Approach on All Protein Sequences from Complete Genomes Without Sequence Alignment
The complete genomes of living organisms have provided much information on their phylogenetic relationships. Similarly, the complete genomes of chloroplasts have helped to resolve the evolution of this organelle in photosynthetic eukaryotes. In this paper we propose an alternative method of phylogenetic analysis using compositional statistics for all protein sequences from complete genomes. This new method is conceptually simpler than and computationally as fast as the one proposed by Qi et al. (2004b) and Chu et al. (2004). The same data sets used in Qi et al. (2004b) and Chu et al. (2004) are analyzed using the new method. Our distance-based phylogenic tree of the 109 prokaryotes and eukaryotes agrees with the biologists tree of life based on 16S rRNA comparison in a predominant majority of basic branching and most lower taxa. Our phylogenetic analysis also shows that the chloroplast genomes are separated to two major clades corresponding to chlorophytes s.l. and rhodophytes s.l. The interrelationships among the chloroplasts are largely in agreement with the current understanding on chloroplast evolution
CandidaDB: a genome database for Candida albicans pathogenomics
CandidaDB is a database dedicated to the genome of the most prevalent systemic fungal pathogen of humans, Candida albicans. CandidaDB is based on an annotation of the Stanford Genome Technology Center C.albicans genome sequence data by the European Galar Fungail Consortium. CandidaDB Release 2.0 (June 2004) contains information pertaining to Assembly 19 of the genome of C.albicans strain SC5314. The current release contains 6244 annotated entries corresponding to 130 tRNA genes and 5917 protein-coding genes. For these, it provides tentative functional assignments along with numerous pre-run analyses that can assist the researcher in the evaluation of gene function for the purpose of specific or large-scale analysis. CandidaDB is based on GenoList, a generic relational data schema and a World Wide Web interface that has been adapted to the handling of eukaryotic genomes. The interface allows users to browse easily through genome data and retrieve information. CandidaDB also provides more elaborate tools, such as pattern searching, that are tightly connected to the overall browsing system. As the C.albicans genome is diploid and still incompletely assembled, CandidaDB provides tools to browse the genome by individual supercontigs and to examine information about allelic sequences obtained from complementary contigs. CandidaDB is accessible at http://genolist.pasteur.fr/CandidaDB
CandidaDB: A genome database for Candida albicans pathogenomics
CandidaDB is a database dedicated to the genome of the most prevalent systemic fungal pathogen of humans, Candida albicans. CandidaDB is based on an annotation of the Stanford Genome Technology Center C.albicans genome sequence data by the European Galar Fungail Consortium. CandidaDB Release 2.0 (June 2004) contains information pertaining to Assembly 19 of the genome of C.albicans strain SC5314. The current release contains 6244 annotated entries corresponding to 130 tRNA genes and 5917 protein-coding genes. For these, it provides tentative functional assignments along with numerous pre-run analyses that can assist the researcher in the evaluation of gene function for the purpose of specific or large-scale analysis. CandidaDB is based on GenoList, a generic relational data schema and a World Wide Web interface that has been adapted to the handling of eukaryotic genomes. The interface allows users to browse easily through genome data and retrieve information. CandidaDB also provides more elaborate tools, such as pattern searching, that are tightly connected to the overall browsing system. As the C.albicans genome is diploid and still incompletely assembled, CandidaDB provides tools to browse the genome by individual supercontigs and to examine information about allelic sequences obtained from complementary contigs. CandidaDB is accessible at http://genolist.pasteur.fr/CandidaDB.Sequence data from C.albicans were obtained from the
Stanford Genome Technology Center (http://www.sequence.
stanford.edu/group/candida). Sequencing of C.albicans was
accomplished with the support of the NIDR and the
Burroughs Wellcome Fund. This work was supported by
grants from the European Commission (QLK2-2000-00795;
MCRTN-CT-2003-504148; ‘Galar Fungail Consortium’) to
A.J.P.B., C.E., A.D., J.E., C.G., B.H., F.M.K., J.P.M. and
R.S. and the Ministere de la Recherche et de la Technologie
(PRFMMIP ‘Re´seau Infections Fongiques’) to C.E. and
C.G. F.T. was supported by the Institut Pasteur Strategic
Horizontal Program on Anopheles gambiae. N.M. was supported by a fellowship of the Junta de Castilla y Leon and
by grants DGCYT (PM-98-0317 and BIO 2002-02124)
to A.D. R.S. was supported in part by grants from the
Spanish Ministerio de Ciencia y Tecnologia (BMC2003-
01023) and Agencia Valenciana de Ciencia i Tecnologia de
la Generalitat Valenciana (Grupos 03/187)
Natural History, Microbes and Sequences: Shouldn't We Look Back Again to Organisms?
The discussion on the existence of prokaryotic species is reviewed. The demonstration that several different mechanisms of genetic exchange and recombination exist has led some to a radical rejection of the possibility of bacterial species and, in general, the applicability of traditional classification categories to the prokaryotic domains. However, in spite of intense gene traffic, prokaryotic groups are not continuously variable but form discrete clusters of phenotypically coherent, well-defined, diagnosable groups of individual organisms. Molecularization of life sciences has led to biased approaches to the issue of the origins of biodiversity, which has resulted in the increasingly extended tendency to emphasize genes and sequences and not give proper attention to organismal biology. As argued here, molecular and organismal approaches that should be seen as complementary and not opposed views of biology
A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins
<p>Abstract</p> <p>Background</p> <p>The most common substitution matrices currently used (BLOSUM and PAM) are based on protein sequences with average amino acid distributions, thus they do not represent a fully accurate substitution model for proteins characterized by a biased amino acid composition. This problem has been addressed recently by adjusting existing matrices, however, to date, no empirical approach has been taken to build matrices which offer a substitution model for comparing proteins sharing an amino acid compositional bias. Here, we present a novel procedure to construct series of symmetrical substitution matrices to align proteins from similarly biased <it>Plasmodium </it>proteomes.</p> <p>Results</p> <p>We generated substitution matrices by selecting from the BLOCKS database those multiple alignments with a compositional bias similar to that of <it>P. falciparum </it>and <it>P. yoelii </it>proteins. A novel 'fuzzy' clustering method was adopted to group sequences within these alignments, showing that this method retains more complete information on the amino acid substitutions when compared to hierarchical clustering. We assessed the performance against the BLOSUM62 series and showed that the usage of our matrices results in an improvement in the performance of BLAST database searches, greatly reducing the number of false positive hits. We then demonstrated applications of the use of novel matrices to improve the annotation of homologs between the two <it>Plasmodium </it>species and to classify members of the <it>P. falciparum </it>RIFIN/STEVOR family.</p> <p>Conclusion</p> <p>We confirmed that in the case of compositionally biased proteins, standard BLOSUM matrices are not suited for optimal alignments, and specific substitution matrices are required. In addition, we showed that the usage of these matrices leads to a reduction of false positive hits, facilitating the automatic annotation process.</p
Beyond representing orthology relations by trees
Reconstructing the evolutionary past of a family of genes is an important aspect of many genomic studies. To help with this, simple relations on a set of sequences called orthology relations may be employed. In addition to being interesting from a practical point of view they are also attractive from a theoretical perspective in that e.\,g.\,a characterization is known for when such a relation is representable by a certain type of phylogenetic tree. For an orthology relation inferred from real biological data it is however generally too much to hope for that it satisfies that characterization. Rather than trying to correct the data in some way or another which has its own drawbacks, as an alternative, we propose to represent an orthology relation in terms of a structure more general than a phylogenetic tree called a phylogenetic network. To compute such a network in the form of a level-1 representation for , we formalize an orthology relation in terms of the novel concept of a symbolic 3- dissimilarity which is motivated by the biological concept of a ``cluster of orthologous groups'', or COG for short. For such maps which assign symbols rather that real values to elements, we introduce the novel {\sc Network-Popping} algorithm which has several attractive properties. In addition, we characterize an orthology relation on some set that has a level-1 representation in terms of eight natural properties for as well as in terms of level-1 representations of orthology relations on certain subsets of
- …