10,409 research outputs found
Predicting protein function with hierarchical phylogenetic profiles: The Gene3D phylo-tuner method applied to eukaryotic Genomes
"Phylogenetic profiling'' is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence-absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence-absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence-absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity from 30% to 100% - and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will "auto-tune'' with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence - absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes
Global Functional Atlas of \u3cem\u3eEscherichia coli\u3c/em\u3e Encompassing Previously Uncharacterized Proteins
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans’ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins
Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach
The genome of Mycobacterium tuberculosis was analyzed using recently developed computational approaches to infer protein function and protein linkages. We evaluated and employed a method to infer genes likely to belong to the same operon, as judged by the nucleotide distance between genes in the same genomic orientation, and combined this method with those of the Rosetta Stone, Phylogenetic Profile and conserved Gene Neighbor computational methods for the inference of protein function
Recommended from our members
Clades of huge phages from across Earth's ecosystems.
Bacteriophages typically have small genomes1 and depend on their bacterial hosts for replication2. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200 kilobases (kb), including a genome of 735 kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Comparative assessment of performance and genome dependence among phylogenetic profiling methods
BACKGROUND: The rapidly increasing speed with which genome sequence data can be generated will be accompanied by an exponential increase in the number of sequenced eukaryotes. With the increasing number of sequenced eukaryotic genomes comes a need for bioinformatic techniques to aid in functional annotation. Ideally, genome context based techniques such as proximity, fusion, and phylogenetic profiling, which have been so successful in prokaryotes, could be utilized in eukaryotes. Here we explore the application of phylogenetic profiling, a method that exploits the evolutionary co-occurrence of genes in the assignment of functional linkages, to eukaryotic genomes. RESULTS: In order to evaluate the performance of phylogenetic profiling in eukaryotes, we assessed the relative performance of commonly used profile construction techniques and genome compositions in predicting functional linkages in both prokaryotic and eukaryotic organisms. When predicting linkages in E. coli with a prokaryotic profile, the use of continuous values constructed from transformed BLAST bit-scores performed better than profiles composed of discretized E-values; the use of discretized E-values resulted in more accurate linkages when using S. cerevisiae as the query organism. Extending this analysis by incorporating several eukaryotic genomes in profiles containing a majority of prokaryotes resulted in similar overall accuracy, but with a surprising reduction in pathway diversity among the most significant linkages. Furthermore, the application of phylogenetic profiling using profiles composed of only eukaryotes resulted in the loss of the strong correlation between common KEGG pathway membership and profile similarity score. Profile construction methods, orthology definitions, ontology and domain complexity were explored as possible sources of the poor performance of eukaryotic profiles, but with no improvement in results. CONCLUSION: Given the current set of completely sequenced eukaryotic organisms, phylogenetic profiling using profiles generated from any of the commonly used techniques was found to yield extremely poor results. These findings imply genome-specific requirements for constructing functionally relevant phylogenetic profiles, and suggest that differences in the evolutionary history between different kingdoms might generally limit the usefulness of phylogenetic profiling in eukaryotes
Predicting Protein Function with Hierarchical Phylogenetic Profiles: The Gene3D Phylo-Tuner Method Applied to Eukaryotic Genomes
“Phylogenetic profiling” is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence–absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence–absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence–absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity—from 30% to 100%—and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will “auto-tune” with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence–absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes
Nucleus-specific linker histones Hho1 and Mlh1 form distinct protein interactions during growth, starvation and development in Tetrahymena thermophila
Chromatin organization influences most aspects of gene expression regulation. The linker histone H1, along with the core histones, is a key component of eukaryotic chromatin. Despite its critical roles in chromatin structure and function and gene regulation, studies regarding the H1 protein-protein interaction networks, particularly outside of Opisthokonts, are limited. The nuclear dimorphic ciliate protozoan Tetrahymena thermophila encodes two distinct nucleus-specific linker histones, macronuclear Hho1 and micronuclear Mlh1. We used a comparative proteomics approach to identify the Hho1 and Mlh1 protein-protein interaction networks in Tetrahymena during growth, starvation, and sexual development. Affinity purification followed by mass spectrometry analysis of the Hho1 and Mlh1 proteins revealed a non-overlapping set of co-purifying proteins suggesting that Tetrahymena nucleus-specific linker histones are subject to distinct regulatory pathways. Furthermore, we found that linker histones interact with distinct proteins under the different stages of the Tetrahymena life cycle. Hho1 and Mlh1 co-purified with several Tetrahymena-specific as well as conserved interacting partners involved in chromatin structure and function and other important cellular pathways. Our results suggest that nucleus-specific linker histones might be subject to nucleus-specific regulatory pathways and are dynamically regulated under different stages of the Tetrahymena life cycle.York University Librarie
- …