Search CORE

4,006 research outputs found

Protein and DNA sequence determinants of thermophilic adaptation

Author: Eugene I Shakhnovich
Igor N Berezovsky
Konstantin B Zeldovich
Philip E Bourne
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2005
Field of study

Prokaryotes living at extreme environmental temperatures exhibit pronounced signatures in the amino acid composition of their proteins and nucleotide compositions of their genomes reflective of adaptation to their thermal environments. However, despite significant efforts, the definitive answer of what are the genomic and proteomic compositional determinants of Optimal Growth Temperature of prokaryotic organisms remained elusive. Here the authors performed a comprehensive analysis of amino acid and nucleotide compositional signatures of thermophylic adaptation by exhaustively evaluating all combinations of amino acids and nucleotides as possible determinants of Optimal Growth Temperature for all prokaryotic organisms with fully sequences genomes.. The authors discovered that total concentration of seven amino acids in proteomes, IVYWREL, serves as a universal proteomic predictor of Optimal Growth Temperature in prokaryotes. Resolving the old-standing controversy the authors determined that the variation in nucleotide composition (increase of purine load, or A+G content with temperature) is largely a consequence of thermal adaptation of proteins. However, the frequency with which A and G nucleotides appear as nearest neighbors in genome sequences is strongly and independently correlated with Optimal Growth Temperature. as a result of codon bias in corresponding genomes. Together these results provide a complete picture of proteomic and genomic determinants of thermophilic adaptation.Comment: in press PLoS Computational Biology; revised versio

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Defning a core genome for the Herpesvirales and exploring their evolutionary relationship with the Caudovirales

Author: Andrade-Martínez Juan S.
Moreno-Gallego J. Leonardo
Reyes Alejandro
Publication venue: Digital Commons@Becker
Publication date: 01/01/2019
Field of study

Digital Commons@Becker

PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications

Author: Adams
Aloy
Apweiler
Bairoch
Brenner
Fariselli
Fariselli
Finkelstein
Fischer
Hamodrakas
Haykin
Hobohm
Jones
Koehl
Levitt
Minsky
Murzin
Murzin
Murzin
Möeller
Nielsen
Pasquier
Pasquier
Pasquier
Pedersen
Petersen
Rice
Rost
Rost
Rumelhart
Sanchez
Schulz
Stevens
Vlahou
Wallin
Publication venue: 'Wiley'
Publication date: 01/01/2001
Field of study

A cascading system of hierarchical, artificial neural networks (named PRED-CLASS) is presented for the generalized classification of proteins into four distinct classes-transmembrane, fibrous, globular, and mixed-from information solely encoded in their amino acid sequences. The architecture of the individual component networks is kept very simple, reducing the number of free parameters (network synaptic weights) for faster training, improved generalization, and the avoidance of data overfitting. Capturing information from as few as 50 protein sequences spread among the four target classes (6 transmembrane, 10 fibrous, 13 globular, and 17 mixed), PRED-CLASS was able to obtain 371 correct predictions out of a set of 387 proteins (success rate approximately 96%) unambiguously assigned into one of the target classes. The application of PRED-CLASS to several test sets and complete proteomes of several organisms demonstrates that such a method could serve as a valuable tool in the annotation of genomic open reading frames with no functional assignment or as a preliminary step in fold recognition and ab initio structure prediction methods. Detailed results obtained for various data sets and completed genomes, along with a web sever running the PRED-CLASS algorithm, can be accessed over the World Wide Web at http://o2.biol.uoa.gr/PRED-CLAS

arXiv.org e-Print Archive

Crossref

The Plant Short-Chain Dehydrogenase (SDR) superfamily:genome-wide inventory and diversification patterns

Author: Kallberg Yvonne
Moummou Hanane
Persson Bengt
Tonfack Libert Brice
Van der Rest Benoît
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background Short-chain dehydrogenases/reductases (SDRs) form one of the largest and oldest NAD(P)(H) dependent oxidoreductase families. Despite a conserved 'Rossmann-fold' structure, members of the SDR superfamily exhibit low sequence similarities, which constituted a bottleneck in terms of identification. Recent classification methods, relying on hidden-Markov models (HMMs), improved identification and enabled the construction of a nomenclature. However, functional annotations of plant SDRs remain scarce. Results Wide-scale analyses were performed on ten plant genomes. The combination of hidden Markov model (HMM) based analyses and similarity searches led to the construction of an exhaustive inventory of plant SDR. With 68 to 315 members found in each analysed genome, the inventory confirmed the over-representation of SDRs in plants compared to animals, fungi and prokaryotes. The plant SDRs were first classified into three major types --- 'classical', 'extended' and 'divergent' --- but a minority (10 % of the predicted SDRs) could not be classified into these general types ('unknown' or 'atypical' types). In a second step, we could categorize the vast majority of land plant SDRs into a set of 49 families. Out of these 49 families, 35 appeared early during evolution since they are commonly found through all the Green Lineage. Yet, some SDR families --- tropinone reductase-like proteins (SDR65C), 'ABA2-like'-NAD dehydrogenase (SDR110C), 'salutaridine/menthone-reductase-like' proteins (SDR114C), 'dihydroflavonol 4-reductase'-like proteins (SDR108E) and 'isoflavone-reductase-like' (SDR460A) proteins --- have undergone significant functional diversification within vascular plants since they diverged from Bryophytes. Interestingly, these diversified families are either involved in the secondary metabolism routes (terpenoids, alkaloids, phenolics) or participate in developmental processes (hormone biosynthesis or catabolism, flower development), in opposition to SDR families involved in primary metabolism which are poorly diversified. Conclusion The application of HMMs to plant genomes enabled us to identify 49 families that encompass all Angiosperms ('higher plants') SDRs, each family being sufficiently conserved to enable simpler analyses based only on overall sequence similarity. The multiplicity of SDRs in plant kingdom is mainly explained by the diversification of large families involved in different secondary metabolism pathways, suggesting that the chemical diversification that accompanied the emergence of vascular plants acted as a driving force for SDR evolution

Publikationer från Linköpings universitet

Crossref

Springer - Publisher Connector

Open Archive Toulouse Archive Ouverte

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

ProdInra

Genomic and proteomic biases inform metabolic engineering strategies for anaerobic fungi.

Author: Albà
Arazoe
Atasoglu
Bach
Beckham
Bezanson
Birdsell
Boch
Bonugli-Santos
Brownlee
Calkins
Camacho
Camiolo
Carlson
Chan
Chen
Cheng
Cheng
Chokhawala
Coker
Deshpande
Diener
Dollhofer
Duarte
Durand
Duret
Fondon
Galtier
Gasiunas
Gentzsch
Gerngross
Glass
Glémin
Greene
Grigoriev
Haitjema
Haitjema
Hamilton
Hanafy
Hartfield
Henske
Hershberg
Hildebrand
Hull
Jiang
Karlin
Kiktev
Kleinstiver
Kleinstiver
Knauer
Knight
Kuyper
Leberer
Li
Liggenstoffer
Liu
Magee
Mertens
Meunier
Morrison
Murphy
Nicholson
Nieuwenhuis
Nørholm
Orpin
Oyola
O’Malley
Podolsky
Raymond
Reichenberger
Ropars
Sadhu
Sammond
Sekowska
Seppälä
Seppälä
Solieri
Solomon
Sonan
Staben
Steensels
Steensels
Sukumaran
Theodorou
UniProt: a worldwide hub of protein knowledge
Videvall
Wang
Wang
Wright
Wu
Ximenes
Youssef
Zetsche
Publication venue: eScholarship, University of California
Publication date: 01/06/2020
Field of study

Anaerobic fungi (Neocallimastigomycota) are emerging non-model hosts for biotechnology due to their wealth of biomass-degrading enzymes, yet tools to engineer these fungi have not yet been established. Here, we show that the anaerobic gut fungi have the most GC depleted genomes among 443 sequenced organisms in the fungal kingdom, which has ramifications for heterologous expression of genes as well as for emerging CRISPR-based genome engineering approaches. Comparative genomic analyses suggest that anaerobic fungi may contain cellular machinery to aid in sexual reproduction, yet a complete mating pathway was not identified. Predicted proteomes of the anaerobic fungi also contain an unusually large fraction of proteins with homopolymeric amino acid runs consisting of five or more identical consecutive amino acids. In particular, threonine runs are especially enriched in anaerobic fungal carbohydrate active enzymes (CAZymes) and this, together with a high abundance of predicted N-glycosylation motifs, suggests that gut fungal CAZymes are heavily glycosylated, which may impact heterologous production of these biotechnologically useful enzymes. Finally, we present a codon optimization strategy to aid in the development of genetic engineering tools tailored to these early-branching anaerobic fungi

Crossref

eScholarship - University of California

Phylogenetic differences in content and intensity of periodic proteins

Author: Gatherer D.
McEwan N.R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2005
Field of study

Many proteins exhibit sequence periodicity, often correlated with a visible structural periodicity. The statistical significance of such periodicity can be assessed by means of a chi-square-based test, with significance thresholds being calculated from shuffled sequences. Comparison of the complete proteomes of 45 species reveals striking differences in the proportion of periodic proteins and the intensity of the most significant periodicities. Eukaryotes tend to have a higher proportion of periodic proteins than eubacteria, which in turn tend to have more than archaea. The intensity of periodicity in the most periodic proteins is also greatest in eukaryotes. By contrast, the relatively small group of periodic proteins in archaea also tend to be weakly periodic compared to those of eukaryotes and eubacteria. Exceptions to this general rule are found in those prokaryotes with multicellular life-cycle phases, e.g. Methanosarcina sps. or Anabaena sps., which have more periodicities than prokaryotes in general, and in unicellular eukaryotes, which have fewer than multicellular eukaryotes. The distribution of significantly periodic proteins in eukaryotes is over a wide range of period lengths, whereas prokaryotic proteins typically have a more limited set of period lengths. This is further investigated by repeating the analysis on the NRL-3D database of proteins of solved structure. Some short range periodicities are explicable in terms of basic secondary structure, e.g. alpha helices, while middle range periodicities are frequently found to consist of known short Pfam domains, e.g. leucine-rich repeats, tetratricopeptides or armadillo domains. However, not all can be explained in this way

Enlighten

Lancaster E-Prints

Predicting protein function by machine learning on amino acid sequences – a critical evaluation

Author: Al-Shahib A
Breitling R
Gilbert D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Copyright @ 2007 Al-Shahib et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. Results: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. Conclusion: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function

University of Groningen

University of Birmingham Research Portal

Directory of Open Access Journals

Enlighten

The University of Manchester - Institutional Repository

Brunel University Research Archive

Crossref

Proceedings - University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

PubMed Central

University of Groningen Digital Archive