33 research outputs found

    Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors

    No full text
    The α-helical transmembrane proteins constitute 25% of the entire human proteome space and are difficult targets in high-resolution wet-lab structural studies, calling for accurate computational predictors. We present a novel sequence-based method called MemBrain-Rasa to predict relative solvent accessibility surface area (rASA) from primary sequences. MemBrain-Rasa features by an ensemble prediction protocol composed of a statistical machine-learning engine, which is trained in the sequential feature space, and a segment template similarity-based engine, which is constructed with solved structures and sequence alignment. We locally constructed a comprehensive database of residue relative solvent accessibility surface area from the solved protein 3D structures in the PDB database. It is searched against for segment templates that are expected to be structurally similar to the query sequence’s segments. The segment template-based prediction is then fused with the support vector regression outputs using knowledge rules. Our experiments show that pure machine learning output cannot cover the entire rASA solution space and will have a serious prediction preference problem due to the relatively small size of membrane protein structures that can be used as the training samples. The template-based engine solves this problem very well, resulting in significant improvement of the prediction performance. MemBrain-Rasa achieves a Pearson correlation coefficient of 0.733 and mean absolute error of 13.593 on the benchmark dataset, which are 26.4% and 26.1% better than existing predictors. MemBrain-Rasa represents a new progress in structure modeling of α-helical transmembrane proteins. MemBrain-Rasa is available at www.csbio.sjtu.edu.cn/bioinf/MemBrain/

    Signal-3L 2.0: A Hierarchical Mixture Model for Enhancing Protein Signal Peptide Prediction by Incorporating Residue-Domain Cross-Level Features

    No full text
    Signal peptides play key roles in targeting and translocation of integral membrane proteins and secretory proteins. However, signal peptides present several challenges for automatic prediction methods. One challenge is that it is difficult to discriminate signal peptides from transmembrane helices, as both the H-region of the peptides and the transmembrane helices are hydrophobic. Another is that it is difficult to identify the cleavage site between signal peptides and mature proteins, as cleavage motifs or patterns are still unclear for most proteins. To solve these problems and further enhance automatic signal peptide recognition, we report a new Signal-3L 2.0 predictor. Our new model is constructed with a hierarchical protocol, where it first determines the existence of a signal peptide. For this, we propose a new residue-domain cross-level feature-driven approach, and we demonstrate that protein functional domain information is particularly useful for discriminating between the transmembrane helices and signal peptides as they perform different functions. Next, in order to accurately identify the unique signal peptide cleavage sites along the sequence, we designed a top-down approach where a subset of potential cleavage sites are screened using statistical learning rules, and then a final unique site is selected according to its evolution conservation score. Because this mixed approach utilizes both statistical learning and evolution analysis, it shows a strong capacity for recognizing cleavage sites. Signal-3L 2.0 has been benchmarked on multiple data sets, and the experimental results have demonstrated its accuracy. The online server is available at www.csbio.sjtu.edu.cn/bioinf/Signal-3L/

    In strategy , the relationship between and the size of problem with different population size .

    No full text
    <p>In strategy , the relationship between and the size of problem with different population size .</p

    Adaptive Firefly Algorithm: Parameter Analysis and its Application

    No full text
    <div><p>As a nature-inspired search algorithm, firefly algorithm (FA) has several control parameters, which may have great effects on its performance. In this study, we investigate the parameter selection and adaptation strategies in a modified firefly algorithm — adaptive firefly algorithm (AdaFa). There are three strategies in AdaFa including (1) a distance-based light absorption coefficient; (2) a gray coefficient enhancing fireflies to share difference information from attractive ones efficiently; and (3) five different dynamic strategies for the randomization parameter. Promising selections of parameters in the strategies are analyzed to guarantee the efficient performance of AdaFa. AdaFa is validated over widely used benchmark functions, and the numerical experiments and statistical tests yield useful conclusions on the strategies and the parameter selections affecting the performance of AdaFa. When applied to the real-world problem — protein tertiary structure prediction, the results demonstrated improved variants can rebuild the tertiary structure with the average root mean square deviation less than 0.4Å and 1.5Å from the native constrains with noise free and 10% Gaussian white noise.</p></div

    In strategy , the relationship between and the population size varying with different power value .

    No full text
    <p>In strategy , the relationship between and the population size varying with different power value .</p

    The kernel smoothing density estimate of (a) TM-Score, (b) GDT-TS-Score, (c) GDT-HA-Score, and (d) RMSD achieved over the native constrains.

    No full text
    <p>AdaFa-–AdaFa- were represented by red solid line, black dotted line, blue dotted dashed line, magenta dashed line, and green solid line, respectively.</p

    Different strategies for : (a) strategy , (b) strategy , (c) strategy varying with different .

    No full text
    <p>Different strategies for : (a) strategy , (b) strategy , (c) strategy varying with different .</p

    The results achieved by different algorithms on the benchmark functions.

    No full text
    <p>The results achieved by different algorithms on the benchmark functions.</p

    The comparison of computational efficiency of each algorithm on the test functions.

    No full text
    <p>The comparison of computational efficiency of each algorithm on the test functions.</p

    The comparison of different strategies for the randomization parameter .

    No full text
    <p>The comparison of different strategies for the randomization parameter .</p
    corecore