33 research outputs found
Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors
The α-helical transmembrane
proteins constitute 25% of the
entire human proteome space and are difficult targets in high-resolution
wet-lab structural studies, calling for accurate computational predictors.
We present a novel sequence-based method called MemBrain-Rasa to predict
relative solvent accessibility surface area (rASA) from primary sequences.
MemBrain-Rasa features by an ensemble prediction protocol composed
of a statistical machine-learning engine, which is trained in the
sequential feature space, and a segment template similarity-based
engine, which is constructed with solved structures and sequence alignment.
We locally constructed a comprehensive database of residue relative
solvent accessibility surface area from the solved protein 3D structures
in the PDB database. It is searched against for segment templates
that are expected to be structurally similar to the query sequence’s
segments. The segment template-based prediction is then fused with
the support vector regression outputs using knowledge rules. Our experiments
show that pure machine learning output cannot cover the entire rASA
solution space and will have a serious prediction preference problem
due to the relatively small size of membrane protein structures that
can be used as the training samples. The template-based engine solves
this problem very well, resulting in significant improvement of the
prediction performance. MemBrain-Rasa achieves a Pearson correlation
coefficient of 0.733 and mean absolute error of 13.593 on the benchmark
dataset, which are 26.4% and 26.1% better than existing predictors.
MemBrain-Rasa represents a new progress in structure modeling of α-helical
transmembrane proteins. MemBrain-Rasa is available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/
Signal-3L 2.0: A Hierarchical Mixture Model for Enhancing Protein Signal Peptide Prediction by Incorporating Residue-Domain Cross-Level Features
Signal peptides play key roles in
targeting and translocation of
integral membrane proteins and secretory proteins. However, signal
peptides present several challenges for automatic prediction methods.
One challenge is that it is difficult to discriminate signal peptides
from transmembrane helices, as both the H-region of the peptides and
the transmembrane helices are hydrophobic. Another is that it is difficult
to identify the cleavage site between signal peptides and mature proteins,
as cleavage motifs or patterns are still unclear for most proteins.
To solve these problems and further enhance automatic signal peptide
recognition, we report a new Signal-3L 2.0 predictor. Our new model
is constructed with a hierarchical protocol, where it first determines
the existence of a signal peptide. For this, we propose a new residue-domain
cross-level feature-driven approach, and we demonstrate that protein
functional domain information is particularly useful for discriminating
between the transmembrane helices and signal peptides as they perform
different functions. Next, in order to accurately identify the unique
signal peptide cleavage sites along the sequence, we designed a top-down
approach where a subset of potential cleavage sites are screened using
statistical learning rules, and then a final unique site is selected
according to its evolution conservation score. Because this mixed
approach utilizes both statistical learning and evolution analysis,
it shows a strong capacity for recognizing cleavage sites. Signal-3L
2.0 has been benchmarked on multiple data sets, and the experimental
results have demonstrated its accuracy. The online server is available
at www.csbio.sjtu.edu.cn/bioinf/Signal-3L/
In strategy , the relationship between and the size of problem with different population size .
<p>In strategy , the relationship between and the size of problem with different population size .</p
Adaptive Firefly Algorithm: Parameter Analysis and its Application
<div><p>As a nature-inspired search algorithm, firefly algorithm (FA) has several control parameters, which may have great effects on its performance. In this study, we investigate the parameter selection and adaptation strategies in a modified firefly algorithm — adaptive firefly algorithm (AdaFa). There are three strategies in AdaFa including (1) a distance-based light absorption coefficient; (2) a gray coefficient enhancing fireflies to share difference information from attractive ones efficiently; and (3) five different dynamic strategies for the randomization parameter. Promising selections of parameters in the strategies are analyzed to guarantee the efficient performance of AdaFa. AdaFa is validated over widely used benchmark functions, and the numerical experiments and statistical tests yield useful conclusions on the strategies and the parameter selections affecting the performance of AdaFa. When applied to the real-world problem — protein tertiary structure prediction, the results demonstrated improved variants can rebuild the tertiary structure with the average root mean square deviation less than 0.4Å and 1.5Å from the native constrains with noise free and 10% Gaussian white noise.</p></div
In strategy , the relationship between and the population size varying with different power value .
<p>In strategy , the relationship between and the population size varying with different power value .</p
The kernel smoothing density estimate of (a) TM-Score, (b) GDT-TS-Score, (c) GDT-HA-Score, and (d) RMSD achieved over the native constrains.
<p>AdaFa-–AdaFa- were represented by red solid line, black dotted line, blue dotted dashed line, magenta dashed line, and green solid line, respectively.</p
Different strategies for : (a) strategy , (b) strategy , (c) strategy varying with different .
<p>Different strategies for : (a) strategy , (b) strategy , (c) strategy varying with different .</p
The results achieved by different algorithms on the benchmark functions.
<p>The results achieved by different algorithms on the benchmark functions.</p
The comparison of computational efficiency of each algorithm on the test functions.
<p>The comparison of computational efficiency of each algorithm on the test functions.</p
The comparison of different strategies for the randomization parameter .
<p>The comparison of different strategies for the randomization parameter .</p