Search CORE

Springer - Publisher Connector

Feature amplified voting algorithm for functional analysis of protein superfamily

Author: Chang Chih-Hung
Chung Yeh-Ching
Hung Che-Lun
Lee Chihan
Lin Chun-Yuan
Yi Tang Chuan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Springer - Publisher Connector

Translational Selection Is Ubiquitous in Prokaryotes

Author: A Carbone
A Carbone
A Carbone
A Wagner
AM Resch
AV Glyakina
B Lafay
C Kimchi-Sarfaty
C Nadeau
C Rispe
DC Hess
EP Rocha
EP Rocha
EP Rocha
EV Koonin
F Meier
F Supek
F Supek
Fran Supek
G Perriere
H Charles
H Grosjean
H Roy
H Suzuki
HS Najafabadi
J Mrazek
J Rozenski
JA Ranea
JC Marioni
JD Selengut
Jelena Repar
JG Lawrence
JH McDonald
JH McDonald
JL Bennetzen
JL Parmley
JO McInerney
JR Lobry
JT Herbeck
K Chen
K Mizuguchi
KA Dittmar
KB Zeldovich
Kristian Vlahoviček
L Breiman
L Dethlefsen
LC Seaver
M Ashburner
M dos Reis
M Neuhauser
M Oresic
MG Langille
MK Kruger
N Molina
N Stoletzki
Nancy A. Moran
Nives Škunca
P Lu
PF Agris
PF Agris
PM Sharp
PM Sharp
PP Chan
R Hershberg
RD Knight
RJ Grocock
RL Tatusov
RM Weiner
S D'Amico
S Kanaya
S Kanaya
S Karlin
S Karlin
S Karlin
S Karlin
SL Chen
T Banerjee
T Fawcett
Tomislav Šmuc
V Daubin
X Xia
Y Ishihama
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome—between 5% and 33%, depending on genome size—while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl–tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an “adaptome” by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea

CiteSeerX

Public Library of Science (PLOS)

Accurate Protein Structure Annotation through Competitive Diffusion of Enzymatic Functions over a Network of Local Evolutionary Similarities

Author: A Arakaki
A Ribes-Zamora
A Vazquez
AD Wilkins
AM Schnoes
Andreas Martin Lisewski
B Adamcsek
BE Engelhardt
Christos Ouzounis
CT Porter
D Barrell
D Warde-Farley
D Zhou
DE Almonacid
DM Kristensen
DS Glazer
E Levy
E Nabieva
EM Marcotte
Eric Venner
F Baameur
F Ferre
F Glaser
F Pazos
G Bader
GJ Rodriguez
H Hishigaki
H Kobayashi
H Shin
H Yao
HJ Atkinson
HN Chua
I Friedberg
I Lee
I Lee
I Mihalek
I Mihalek
J Byun
J Chandonia
J Rhee
J Song
J Westbrook
JA Capra
JD Watson
JJ Mukherjee
K Krisch
K Tsuda
K Wang
L Holm
L Jaroszewski
L Rajagopalan
LH Greene
M Deng
M Larkin
ME Sowa
ME Sowa
MEJ Newman
MI Sadowski
MK Ross
MM Bonde
N Furnham
N Nariai
ND Gold
O Lichtarge
O Lichtarge
O Lichtarge
OC Redfern
OC Redfern
Olivier Lichtarge
P Gu
P Hu
PA Alexander
PC Wu
PF Gherardini
R Onrust
R Sharan
R She
R. Matthew Ward
RA Chiang
RA Laskowski
RA Laskowski
RA Laskowski
RM Ward
S Altschul
S Erdin
S Hennig
S Madabushi
S Madabushi
SB Pandit
SD Copley
SE Brenner
SE Brenner
Serkan Erdin
SF Altschul
Shivas R. Amin
SK Shenoy
SR Collins
SR Gill
T Hsiao
V van Noort
X Quan
Y Qi
YY Tseng
Publication venue: Public Library of Science
Publication date: 13/12/2010
Field of study

High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities; then, starting from proteins with known functions, competing functional labels diffuse link-by-link over the entire network. Every node is thus assigned a likelihood z-score for every function, and the most significant one at each node wins and defines its annotation. In high-throughput controls, this competitive diffusion process recovered enzyme activity annotations with 99% and 97% accuracy at half-coverage for the third and fourth Enzyme Commission (EC) levels, respectively. This corresponds to false positive rates 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. In practice, experimental validation of the predicted carboxylesterase activity in a protein from Staphylococcus aureus illustrated the effectiveness of this approach in the context of an increasingly drug-resistant microbe. This study further links molecular function to a small number of evolutionarily important residues recognizable by Evolutionary Tracing and it points to the specificity and sensitivity of functional annotation by competitive global network diffusion. A web server is at http://mammoth.bcm.tmc.edu/networks

Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique

Author: Bakanina Kissanga Grace-Mercure
Farwa Hassan
Fen Liu
Hasan Zulfiqar
Hasan Zulfiqar
Zahoor Ahmed
Zhao-Yue Zhang
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2023
Field of study

Promotors are those genomic regions on the upstream of genes, which are bound by RNA polymerase for starting gene transcription. Because it is the most critical element of gene expression, the recognition of promoters is crucial to understand the regulation of gene expression. This study aimed to develop a machine learning-based model to predict promotors in Agrobacterium tumefaciens (A. tumefaciens) strain C58. In the model, promotor sequences were encoded by three different kinds of feature descriptors, namely, accumulated nucleotide frequency, k-mer nucleotide composition, and binary encodings. The obtained features were optimized by using correlation and the mRMR-based algorithm. These optimized features were inputted into a random forest (RF) classifier to discriminate promotor sequences from non-promotor sequences in A. tumefaciens strain C58. The examination of 10-fold cross-validation showed that the proposed model could yield an overall accuracy of 0.837. This model will provide help for the study of promoters in A. tumefaciens C58 strain

Predicting changes in protein thermostability brought about by single- or multi-site mutations

Author: Chu Xiaoyu
Fan Yunliu
Tian Jian
Wu Ningfeng
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background An important aspect of protein design is the ability to predict changes in protein thermostability arising from single- or multi-site mutations. Protein thermostability is reflected in the change in free energy (ΔΔ<it>G</it>) of thermal denaturation. Results We have developed predictive software, Prethermut, based on machine learning methods, to predict the effect of single- or multi-site mutations on protein thermostability. The input vector of Prethermut is based on known structural changes and empirical measurements of changes in potential energy due to protein mutations. Using a 10-fold cross validation test on the M-dataset, consisting of 3366 mutants proteins from ProTherm, the classification accuracy of random forests and the regression accuracy of random forest regression were slightly better than support vector machines and support vector regression, whereas the overall accuracy of classification and the Pearson correlation coefficient of regression were 79.2% and 0.72, respectively. Prethermut performs better on proteins containing multi-site mutations than those with single mutations. Conclusions The performance of Prethermut indicates that it is a useful tool for predicting changes in protein thermostability brought about by single- or multi-site mutations and will be valuable in the rational design of proteins.</p

Springer - Publisher Connector

Adelaide Research & Scholarship

PreAcrs: a machine learning framework for identifying anti-CRISPR proteins

Author: Li F.
Song J.
Wang X.
Zhu L.
Publication venue: BioMed Central
Publication date: 01/01/2022
Field of study

Published online: 25 October 2022Background: Anti-CRISPR proteins are potent modulators that inhibit the CRISPRCas immunity system and have huge potential in gene editing and gene therapy as a genome-editing tool. Extensive studies have shown that anti-CRISPR proteins are essential for modifying endogenous genes, promoting the RNA-guided binding and cleavage of DNA or RNA substrates. In recent years, identifying and characterizing anti-CRISPR proteins has become a hot and significant research topic in bioinformatics. However, as most anti-CRISPR proteins fall short in sharing similarities to those currently known, traditional screening methods are time-consuming and inefficient. Machine learning methods could fill this gap with powerful predictive capability and provide a new perspective for anti-CRISPR protein identification. Results: Here, we present a novel machine learning ensemble predictor, called PreAcrs, to identify anti-CRISPR proteins from protein sequences directly. Three features and eight different machine learning algorithms were used to train PreAcrs. PreAcrs outperformed other existing methods and significantly improved the prediction accuracy for identifying anti-CRISPR proteins. Conclusions: In summary, the PreAcrs predictor achieved a competitive performance for predicting new anti-CRISPR proteins in terms of accuracy and robustness. We anticipate PreAcrs will be a valuable tool for researchers to speed up the research process. The source code is available at: https://github.com/Lyn-666/anti_CRISPR.git.Lin Zhu, Xiaoyu Wang, Fuyi Li and Jiangning Son