67 research outputs found

    Prediction of aptamer-protein interacting pairs using an ensemble classifier in combination with various protein sequence attributes

    Get PDF
    The ranked feature list given by the Relief algorithm. Within the list, a feature with a smaller index indicates that it is more important for aptamer-protein interacting pair prediction. Such a list of ranked features are used to establish the optimal feature set in the IFS procedure. (XLS 56.5 kb

    Serology assessment of antibody response to SARS-CoV-2 in patients with COVID-19 by rapid IgM/IgG antibody test

    Get PDF
    The coronavirus disease 2019 (COVID-19) pandemic has created a global health- and economic crisis. Detection of antibodies to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) which causes COVID-19 by serological methods is important to diagnose a current or resolved infection. In this study, we applied a rapid COVID-19 IgM/IgG antibody test and performed serology assessment of antibody response to SARS-CoV-2. In PCR-confirmed COVID-19 patients (n = 45), the total antibody detection rate is 92% in hospitalized patients and 79% in non-hospitalized patients. The total IgM and IgG detection is 63% in patients with 2 weeks disease duration; and 91% in hospitalized patients with >2 weeks disease duration. We also compared different blood sample types and suggest a higher sensitivity by serum/plasma over whole blood. Test specificity was determined to be 97% on 69 sera/plasma samples collected between 2016-2018. Our study provides a comprehensive validation of the rapid COVID-19 IgM/IgG serology test, and mapped antibody detection patterns in association with disease progress and hospitalization. Our results support that the rapid COVID-19 IgM/IgG test may be applied to assess the COVID-19 status both at the individual and at a population level. © 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.Peer reviewe

    An Effective Antifreeze Protein Predictor with Ensemble Classifiers and Comprehensive Sequence Descriptors

    No full text
    Antifreeze proteins (AFPs) play a pivotal role in the antifreeze effect of overwintering organisms. They have a wide range of applications in numerous fields, such as improving the production of crops and the quality of frozen foods. Accurate identification of AFPs may provide important clues to decipher the underlying mechanisms of AFPs in ice-binding and to facilitate the selection of the most appropriate AFPs for several applications. Based on an ensemble learning technique, this study proposes an AFP identification system called AFP-Ensemble. In this system, random forest classifiers are trained by different training subsets and then aggregated into a consensus classifier by majority voting. The resulting predictor yields a sensitivity of 0.892, a specificity of 0.940, an accuracy of 0.938 and a balanced accuracy of 0.916 on an independent dataset, which are far better than the results obtained by previous methods. These results reveal that AFP-Ensemble is an effective and promising predictor for large-scale determination of AFPs. The detailed feature analysis in this study may give useful insights into the molecular mechanisms of AFP-ice interactions and provide guidance for the related experimental validation. A web server has been designed to implement the proposed method

    A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data

    No full text
    The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew’s Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions

    1 Searching Cycle-Disjoint Graphs ∗

    No full text
    Abstract. In this paper, we consider the edge searching problem on cycle-disjoint graphs. We first improve the running time of the algorithm to compute the vertex separation and the optimal layout of a unicyclic graph, which is given by Ellis et al. (2004), from O(n log n) to O(n). By a linear-time transformation, we can compute the edge search number of a unicyclic graph in linear time. We also propose an O(n) time algorithm to compute the edge search number and the optimal edge search strategy of a cycle-disjoint graph in which every cycle has at most three vertices with degree more than two. We show how to compute the search number for a k-ary cycle-disjoint graph. We also present some results on approximation algorithms

    A New Method of Wheelset Bearing Fault Diagnosis

    No full text
    During the movement of rail trains, trains are often subjected to harsh operating conditions such as variable speed and heavy loads. It is therefore vital to find a solution for the issue of rolling bearing malfunction diagnostics in such circumstances. This study proposes an adaptive technique for defect identification based on multipoint optimal minimum entropy deconvolution adjusted (MOMEDA) and Ramanujan subspace decomposition. MOMEDA optimally filters the signal and enhances the shock component corresponding to the defect, after which the signal is automatically decomposed into a sequence of signal components using Ramanujan subspace decomposition. The method’s benefit stems from the flawless integration of the two methods and the addition of the adaptable module. It addresses the issues that the conventional signal decomposition and subspace decomposition methods have with redundant parts and significant inaccuracies in fault feature extraction for the vibration signals under loud noise. Finally, it is evaluated through simulation and experimentation in comparison to the current widely used signal decomposition techniques. According to the findings of the envelope spectrum analysis, the novel technique can precisely extract the composite flaws that are present in the bearing, even when there is significant noise interference. Additionally, the signal-to-noise ratio (SNR) and fault defect index were introduced to quantitatively demonstrate the novel method’s denoising and potent fault extraction capabilities, respectively. The approach works well for identifying bearing faults in train wheelsets

    A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique

    No full text
    Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance of identifying cancerlectins. Hybrid feature space before feature selection is developed by combining different individual feature spaces, CTD (Composition, Transition, and Distribution), PseAAC (Pseudo Amino Acid Composition), PSSM (Position-Specific Scoring Matrix), and disorder. The SMOTE (Synthetic Minority Oversampling Technique) is applied to solve the imbalanced data problem. To reduce feature redundancy and computation complexity, we propose a two-step feature selection process to select informative features. A 5-fold cross-validation technique is used for the evaluation of various prediction strategies. The proposed method achieves a sensitivity of 0.779, a specificity of 0.717, an accuracy of 0.748, and an MCC (Matthew’s Correlation Coefficient) of 0.497. The prediction results are also compared with other existing methods on the same dataset using 5-fold cross-validation. The comparison results demonstrate the high effectiveness of our method for predicting cancerlectins

    A Novel Capsule Network with Attention Routing to Identify Prokaryote Phosphorylation Sites

    No full text
    By denaturing proteins and promoting the formation of multiprotein complexes, protein phosphorylation has important effects on the activity of protein functional molecules and cell signaling. The regulation of protein phosphorylation allows microbes to respond rapidly and reversibly to specific environmental stimuli or niches, which is closely related to the molecular mechanisms of bacterial drug resistance. Accurate prediction of phosphorylation sites (p-site) of prokaryotes can contribute to addressing bacterial resistance and providing new perspectives for developing novel antibacterial drugs. Most existing studies focus on human phosphorylation sites, while tools targeting phosphorylation site identification of prokaryotic proteins are still relatively scarce. This study designs a capsule network-based prediction technique for p-site in prokaryotes. To address the poor scalability and unreliability of dynamic routing processes in the output space of capsule networks, a more reliable way is introduced to learn the consistency between capsules. We incorporate a self-attention mechanism into the routing algorithm to capture the global information of the capsule, reducing the computational effort while enriching the representation capability of the capsule. Aiming at the weak robustness of the model, EcapsP improves the prediction accuracy and stability by introducing shortcuts and unconditional reconfiguration. In addition, the study compares and analyzes the prediction performance based on word vectors, physicochemical properties, and mixing characteristics in predicting serine (Ser/S), threonine (Thr/T), and tyrosine (Tyr/Y) p-site. The comprehensive experimental results show that the accuracy of the developed technique is close to 70% for the identification of the three phosphorylation sites in prokaryotes. Importantly, in side-by-side comparisons with other state-of-the-art predictors, our method improves the Matthews correlation coefficient (MCC) by approximately 7%. The results demonstrate the superiority of EcapsP in terms of high performance and reliability

    An Effective Antifreeze Protein Predictor with Ensemble Classifiers and Comprehensive Sequence Descriptors

    No full text
    Antifreeze proteins (AFPs) play a pivotal role in the antifreeze effect of overwintering organisms. They have a wide range of applications in numerous fields, such as improving the production of crops and the quality of frozen foods. Accurate identification of AFPs may provide important clues to decipher the underlying mechanisms of AFPs in ice-binding and to facilitate the selection of the most appropriate AFPs for several applications. Based on an ensemble learning technique, this study proposes an AFP identification system called AFP-Ensemble. In this system, random forest classifiers are trained by different training subsets and then aggregated into a consensus classifier by majority voting. The resulting predictor yields a sensitivity of 0.892, a specificity of 0.940, an accuracy of 0.938 and a balanced accuracy of 0.916 on an independent dataset, which are far better than the results obtained by previous methods. These results reveal that AFP-Ensemble is an effective and promising predictor for large-scale determination of AFPs. The detailed feature analysis in this study may give useful insights into the molecular mechanisms of AFP-ice interactions and provide guidance for the related experimental validation. A web server has been designed to implement the proposed method
    corecore