9 research outputs found

    IDENTIFYING MOLECULAR FUNCTIONS OF DYNEIN MOTOR PROTEINS USING EXTREME GRADIENT BOOSTING ALGORITHM WITH MACHINE LEARNING

    Get PDF
    The majority of cytoplasmic proteins and vesicles move actively primarily to dynein motor proteins, which are the cause of muscle contraction. Moreover, identifying how dynein are used in cells will rely on structural knowledge. Cytoskeletal motor proteins have different molecular roles and structures, and they belong to three superfamilies of dynamin, actin and myosin. Loss of function of specific molecular motor proteins can be attributed to a number of human diseases, such as Charcot-Charcot-Dystrophy and kidney disease.  It is crucial to create a precise model to identify dynein motor proteins in order to aid scientists in understanding their molecular role and designing therapeutic targets based on their influence on human disease. Therefore, we develop an accurate and efficient computational methodology is highly desired, especially when using cutting-edge machine learning methods. In this article, we proposed a machine learning-based superfamily of cytoskeletal motor protein locations prediction method called extreme gradient boosting (XGBoost). We get the initial feature set All by extraction the protein features from the sequence and evolutionary data of the amino acid residues named BLOUSM62. Through our successful eXtreme gradient boosting (XGBoost), accuracy score 0.8676%, Precision score 0.8768%, Sensitivity score 0.760%, Specificity score 0.9752% and MCC score 0.7536%.  Our method has demonstrated substantial improvements in the performance of many of the evaluation parameters compared to other state-of-the-art methods. This study offers an effective model for the classification of dynein proteins and lays a foundation for further research to improve the efficiency of protein functional classification

    Identification of the ubiquitinā€“proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network

    Get PDF
    The major mechanism of proteolysis in the cytosol and nucleus is the ubiquitinā€“proteasome pathway (UPP). The highly controlled UPP has an effect on a wide range of cellular processes and substrates, and flaws in the system can lead to the pathogenesis of a number of serious human diseases. Knowledge about UPPs provide useful hints to understand the cellular process and drug discovery. The exponential growth in next-generation sequencing wet lab approaches have accelerated the accumulation of unannotated data in online databases, making the UPP characterization/analysis task more challenging. Thus, computational methods are used as an alternative for fast and accurate identification of UPPs. Aiming this, we develop a novel deep learning-based predictor named ā€œ2DCNN-UPPā€ for identifying UPPs with low error rate. In the proposed method, we used proposed algorithm with a two-dimensional convolutional neural network with dipeptide deviation features. To avoid the over fitting problem, genetic algorithm is employed to select the optimal features. Finally, the optimized attribute set are fed as input to the 2D-CNN learning engine for building the model. Empirical evidence or outcomes demonstrates that the proposed predictor achieved an overall accuracy and AUC (ROC) value using 10-fold cross validation test. Superior performance compared to other state-of-the art methods for discrimination the relations UPPs classification. Both on and independent test respectively was trained on 10-fold cross validation method and then evaluated through independent test. In the case where experimentally validated ubiquitination sites emerged, we must devise a proteomics-based predictor of ubiquitination. Meanwhile, we also evaluated the generalization power of our trained modal via independent test, and obtained remarkable performance in term of 0.862 accuracy, 0.921 sensitivity, 0.803 specificity 0.803, and 0.730 Matthews correlation coefficient (MCC) respectively. Four approaches were used in the sequences, and the physical properties were calculated combined. When used a 10-fold cross-validation, 2D-CNN-UPP obtained an AUC (ROC) value of 0.862 predicted score. We analyzed the relationship between UPP protein and non-UPP protein predicted score. Last but not least, this research could effectively analyze the large scale relationship between UPP proteins and non-UPP proteins in particular and other protein problems in general and our research work might improve computational biological research. Therefore, we could utilize the latest features in our model framework and Dipeptide Deviation from Expected Mean (DDE) -based protein structure features for the prediction of protein structure, functions, and different molecules, such as DNA and RNA

    AI and Machine Learning-based practices in various domains: A Survey

    No full text
    In several projects in computational biology (CB), bioinformatics, health informatics(HI), precision medicine(PM) and precision agriculture(PA) machine learning(ML) has become a primary resource. In this paper we studied the use of machine learning in the development of computational methods for top ļ¬ve research aeras. The last few years have seen an increased interest in Artiļ¬cial Intelligence (AI), comprehensive ML and DL techniques for computational method development. Over the years, an enormous amount of research has been biomedical scientists still donā€™t have more knowledge to handle a biomedical projects eļ¬ƒciently and may, therefore, adopt wrong methods, which can lead to frequent errors or inļ¬‚ated tests. Healthcare has become a fruitful ground for artiļ¬cial intelligence (AI) and machine learning due to the increase in the volume, diversity, and complexity of data (ML). Healthcare providers and life sciences businesses already use a variety of AI technologies. The review summarizes a traditional machine learning cycle, several machine learning algorithms, various techniques to data analysis, and effective use in ļ¬ve research areas. In this comprehensive review analysis, we proposed 10 ten rapid and accurate practices to use ML techniques in health informatics, bioinformatics, computational and systems biology, precision medicine and precision agriculture, avoid some common mistakes that we have observed several hundred times in several computational method works

    Identifying disease genes based on machine learning approaches for classiļ¬cation

    No full text
    In recent years, researchers have become increasingly interested in disease-gene association prediction. In the postgenomic era, this is one of the toughest jobs around. It is also challenging to determine biological research since complex disorders sometimes have very varied genotypes. Machine learning methods are used widely in the identiļ¬cation of crawl marks, but their images depend heavily on their quantity and quality. In crawling studies, we ļ¬nd that the recognition of genes reconciling diseases can be improved by an machines classiļ¬er qualiļ¬ed in practical gene seamlessness from gene ontology (GO). In order to predict the genes of the disease, weā€™ve developed a supervised machine learning system. In the proposed pipeline, the use of autism spectrum disorder (ASD) is assessed. Similarity tests from various semantics have been used to quantitatively measure similarity in gene function. In this paper we suggest various techniques for classifying data from one-hot encoding method. This experiment is complicated by the fact that the into training and test sets. This is generally called an algorithm evaluation divided-train-test split method. ASD is a disease associated with high health care costs and early intervention will signiļ¬cantly minimize these costs. ASD is a neurodevelopment disorder. Unfortunately, wait times are lengthy for an ASD diagnosis and treatments are not cheap. The economic effects of autism and an increase in ASD cases worldwide show an urgent need to establish methods of screening that are quickly enforced and eļ¬ƒcient. A timely and affordable ASD screening is therefore imminent to help health practitioners and to let individuals know whether they will be formally diagnosed clinically. Classiļ¬ers qualiļ¬ed and validated for ASD and non-ASD genes work better than ASD classiļ¬ers previously reported. For instance, in order to predict new ASD genes, the complementary forest classiļ¬cation (CF) classiļ¬cation reached AUC 0.80 above the reported classiļ¬cation (0.73). Continuing, 73 novel ASD candidate bases can be predicted by the classiļ¬er function. Such genes enrich the central ASD syndrome, such as autism and compulsion

    AOPs-XGBoost: Machine learning Model for the prediction of Antioxidant Proteins properties of peptides

    No full text
    Abstract Antioxidant proteins are essential for protecting cells from free radicals. The accurate identiļ¬cation of antioxidant proteins via biological tests is difļ¬cult because of the high time and ļ¬nancial investment required. The potential of peptides produced from natural proteins is demonstrated by the fact that they are generally regarded as secure and may have additional advantageous bioactivities. Antioxidative peptides are typically discovered by analyzing numerous peptides created when a variety of proteases hydrolysis proteins. The eXtreme Gradient Boosting (XGBoost) technique was used to create a novel model for the current study, which was then compared to the most popular machine learning models. We suggested a machine-learning model that we named AOPs-XGBoost, built on sequence features and Extreme Gradient Boosting (XGBoost). We used 10-fold cross-validation testing was performed on a testing dataset using the propose. AOPs-XGBoost classiļ¬er, and the results showed a sensitivity of 67.56%, speciļ¬city of 93.87%, average accuracy of 80.72%, mean cross-validation (MCC) of 66.29%), and area under the receiver operating characteristic curve (AUC) of 88.01%. The outcomes demonstrated that the XGBoost model outperformed the other models with accuracy of 80.72% and area under the receiver operating characteristic curve of 88.01% which were better than the other models. Experimental results demonstrate that AOPs-XGBoost is a useful classiļ¬er that advances the study of antioxidant proteins

    XGboost-Ampy: Identification of AMPylation Protein Function Prediction Using Machine Learning

    No full text
    A developing post-translational modification known as AMPylation involves the formation of a phosphodiester bond on the hydroxyl group of threonine, serine, or tyrosine. Adenosine monophosphate is covalently attached to the side chain of an amino acid in a peptide during this process, which is catalyzed by AMPylation. We used AMPylation peptide sequence data from bacteria, eukaryotes, and archaea to train the models. Then, we compared the results of several feature extraction methods and their combinations in addition to classification algorithms to obtain more accurate prediction models. To prevent additional loss of sequence information, the PseAAC feature is employed to construct a fixed-size descriptor value in vector space. The basic feature set is received from 2nd features extraction method. All of this was accomplished by deriving the protein characteristics from the evolutionary data and sequence of the BLOUSM62 amino acid residue. The eXtreme Gradient Boosting (XGBoost) technique was used to create a novel model for the current study, which was then compared to the most popular machine learning models. In this research, we proposed framework for AMPylation identification that makes use of the XGBoost algorithm (AMPylation) and sequence-derived functions. XGBoost -Ampy has an accuracy of 86.7%, a sensitivity of 76.1%, a specificity of 97.5%, and a Matthewsā€™s correlation coefficient (MCC) of 0.753 for predicting AMylation sites. XGBoost -Amp, the first machine learning model developed, has shown promise and may be able to help with this problem
    corecore