1,395 research outputs found

    Predicting protein function by machine learning on amino acid sequences – a critical evaluation

    Get PDF
    Copyright @ 2007 Al-Shahib et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. Results: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. Conclusion: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function

    Complete genome sequence of Thermaerobacter marianensis type strain (7p75a).

    Get PDF
    Thermaerobacter marianensis Takai et al. 1999 is the type species of the genus Thermaerobacter, which belongs to the Clostridiales family Incertae Sedis XVII. The species is of special interest because T. marianensis is an aerobic, thermophilic marine bacterium, originally isolated from the deepest part in the western Pacific Ocean (Mariana Trench) at the depth of 10.897m. Interestingly, the taxonomic status of the genus has not been clarified until now. The genus Thermaerobacter may represent a very deep group within the Firmicutes or potentially a novel phylum. The 2,844,696 bp long genome with its 2,375 protein-coding and 60 RNA genes consists of one circular chromosome and is a part of the Genomic Encyclopedia of Bacteria and Archaea project

    Genomics of an extreme psychrophile, Psychromonas ingrahamii

    Get PDF
    © 2008 Riley et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The definitive version was published in BMC Genomics 9 (2008): 210, doi:10.1186/1471-2164-9-210.The genome sequence of the sea-ice bacterium Psychromonas ingrahamii 37, which grows exponentially at -12C, may reveal features that help to explain how this extreme psychrophile is able to grow at such low temperatures. Determination of the whole genome sequence allows comparison with genes of other psychrophiles and mesophiles. Correspondence analysis of the composition of all P. ingrahamii proteins showed that (1) there are 6 classes of proteins, at least one more than other bacteria, (2) integral inner membrane proteins are not sharply separated from bulk proteins suggesting that, overall, they may have a lower hydrophobic character, and (3) there is strong opposition between asparagine and the oxygen-sensitive amino acids methionine, arginine, cysteine and histidine and (4) one of the previously unseen clusters of proteins has a high proportion of "orphan" hypothetical proteins, raising the possibility these are cold-specific proteins. Based on annotation of proteins by sequence similarity, (1) P. ingrahamii has a large number (61) of regulators of cyclic GDP, suggesting that this bacterium produces an extracellular polysaccharide that may help sequester water or lower the freezing point in the vicinity of the cell. (2) P. ingrahamii has genes for production of the osmolyte, betaine choline, which may balance the osmotic pressure as sea ice freezes. (3) P. ingrahamii has a large number (11) of three-subunit TRAP systems that may play an important role in the transport of nutrients into the cell at low temperatures. (4) Chaperones and stress proteins may play a critical role in transforming nascent polypeptides into 3-dimensional configurations that permit low temperature growth. (5) Metabolic properties of P. ingrahamii were deduced. Finally, a few small sets of proteins of unknown function which may play a role in psychrophily have been singled out as worthy of future study. The results of this genomic analysis provide a springboard for further investigations into mechanisms of psychrophily. Focus on the role of asparagine excess in proteins, targeted phenotypic characterizations and gene expression investigations are needed to ascertain if and how the organism regulates various proteins in response to growth at lower temperatures.MR acknowledges support from DE-FG02-04ER63940. JTS acknowledges the support from the University of Washington NASA NAI program and the NSF Astrobiology IGERT program. TZW acknowledges support from a grant from the Fondation Fourmentin-Guilbert and AD acknowledges support from the European Union BioSapiens Network of Excellence, Grant LSHG CT-2003-50326
    • …
    corecore