12 research outputs found

    UBI-XGB: IDENTIFICATION OF UBIQUITIN PROTEINS USING MACHINE LEARNING MODEL

    Get PDF
    A recent line of research has focused on Ubiquitination, a pervasive and proteasome-mediated protein degradation that controls apoptosis and is crucial in the breakdown of proteins and the development of cell disorders, is a major factor.  The turnover of proteins and ubiquitination are two related processes. We predict ubiquitination sites; these attributes are lastly fed into the extreme gradient boosting (XGBoost) classifier. We develop reliable predictors computational tool using experimental identification of protein ubiquitination sites is typically labor- and time-intensive. First, we encoded protein sequence features into matrix data using Dipeptide Deviation from Expected Mean (DDE) features encoding techniques. We also proposed 2nd features extraction model named dipeptide composition (DPC) model. It is vital to develop reliable predictors since experimental identification of protein ubiquitination sites is typically labor- and time-intensive. In this paper, we proposed computational method as named Ubipro-XGBoost, a multi-view feature-based technique for predicting ubiquitination sites. Recent developments in proteomic technology have sparked renewed interest in the identification of ubiquitination sites in a number of human disorders, which have been studied experimentally and clinically.  When more experimentally verified ubiquitination sites appear, we developed a predictive algorithm that can locate lysine ubiquitination sites in large-scale proteome data. This paper introduces Ubipro-XGBoost, a machine learning method. Ubipro-XGBoost had an AUC (area under the Receiver Operating Characteristic curve) of 0.914% accuracy, 0.836% Sensitivity, 0.992% Specificity, and 0.839% MCC on a 5-fold cross validation based on DPC model, and 2nd 0.909% accuracy, 0.839% Sensitivity, 0.979% Specificity, and 0. 0.829% MCC on a 5-fold cross validation based on DDE model. The findings demonstrate that the suggested technique, Ubipro-XGBoost, outperforms conventional ubiquitination prediction methods and offers fresh advice for ubiquitination site identification

    IDENTIFYING MOLECULAR FUNCTIONS OF DYNEIN MOTOR PROTEINS USING EXTREME GRADIENT BOOSTING ALGORITHM WITH MACHINE LEARNING

    Get PDF
    The majority of cytoplasmic proteins and vesicles move actively primarily to dynein motor proteins, which are the cause of muscle contraction. Moreover, identifying how dynein are used in cells will rely on structural knowledge. Cytoskeletal motor proteins have different molecular roles and structures, and they belong to three superfamilies of dynamin, actin and myosin. Loss of function of specific molecular motor proteins can be attributed to a number of human diseases, such as Charcot-Charcot-Dystrophy and kidney disease.  It is crucial to create a precise model to identify dynein motor proteins in order to aid scientists in understanding their molecular role and designing therapeutic targets based on their influence on human disease. Therefore, we develop an accurate and efficient computational methodology is highly desired, especially when using cutting-edge machine learning methods. In this article, we proposed a machine learning-based superfamily of cytoskeletal motor protein locations prediction method called extreme gradient boosting (XGBoost). We get the initial feature set All by extraction the protein features from the sequence and evolutionary data of the amino acid residues named BLOUSM62. Through our successful eXtreme gradient boosting (XGBoost), accuracy score 0.8676%, Precision score 0.8768%, Sensitivity score 0.760%, Specificity score 0.9752% and MCC score 0.7536%.  Our method has demonstrated substantial improvements in the performance of many of the evaluation parameters compared to other state-of-the-art methods. This study offers an effective model for the classification of dynein proteins and lays a foundation for further research to improve the efficiency of protein functional classification

    Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network

    Get PDF
    The major mechanism of proteolysis in the cytosol and nucleus is the ubiquitin–proteasome pathway (UPP). The highly controlled UPP has an effect on a wide range of cellular processes and substrates, and flaws in the system can lead to the pathogenesis of a number of serious human diseases. Knowledge about UPPs provide useful hints to understand the cellular process and drug discovery. The exponential growth in next-generation sequencing wet lab approaches have accelerated the accumulation of unannotated data in online databases, making the UPP characterization/analysis task more challenging. Thus, computational methods are used as an alternative for fast and accurate identification of UPPs. Aiming this, we develop a novel deep learning-based predictor named “2DCNN-UPP” for identifying UPPs with low error rate. In the proposed method, we used proposed algorithm with a two-dimensional convolutional neural network with dipeptide deviation features. To avoid the over fitting problem, genetic algorithm is employed to select the optimal features. Finally, the optimized attribute set are fed as input to the 2D-CNN learning engine for building the model. Empirical evidence or outcomes demonstrates that the proposed predictor achieved an overall accuracy and AUC (ROC) value using 10-fold cross validation test. Superior performance compared to other state-of-the art methods for discrimination the relations UPPs classification. Both on and independent test respectively was trained on 10-fold cross validation method and then evaluated through independent test. In the case where experimentally validated ubiquitination sites emerged, we must devise a proteomics-based predictor of ubiquitination. Meanwhile, we also evaluated the generalization power of our trained modal via independent test, and obtained remarkable performance in term of 0.862 accuracy, 0.921 sensitivity, 0.803 specificity 0.803, and 0.730 Matthews correlation coefficient (MCC) respectively. Four approaches were used in the sequences, and the physical properties were calculated combined. When used a 10-fold cross-validation, 2D-CNN-UPP obtained an AUC (ROC) value of 0.862 predicted score. We analyzed the relationship between UPP protein and non-UPP protein predicted score. Last but not least, this research could effectively analyze the large scale relationship between UPP proteins and non-UPP proteins in particular and other protein problems in general and our research work might improve computational biological research. Therefore, we could utilize the latest features in our model framework and Dipeptide Deviation from Expected Mean (DDE) -based protein structure features for the prediction of protein structure, functions, and different molecules, such as DNA and RNA

    AI and Machine Learning-based practices in various domains: A Survey

    No full text
    In several projects in computational biology (CB), bioinformatics, health informatics(HI), precision medicine(PM) and precision agriculture(PA) machine learning(ML) has become a primary resource. In this paper we studied the use of machine learning in the development of computational methods for top five research aeras. The last few years have seen an increased interest in Artificial Intelligence (AI), comprehensive ML and DL techniques for computational method development. Over the years, an enormous amount of research has been biomedical scientists still don’t have more knowledge to handle a biomedical projects efficiently and may, therefore, adopt wrong methods, which can lead to frequent errors or inflated tests. Healthcare has become a fruitful ground for artificial intelligence (AI) and machine learning due to the increase in the volume, diversity, and complexity of data (ML). Healthcare providers and life sciences businesses already use a variety of AI technologies. The review summarizes a traditional machine learning cycle, several machine learning algorithms, various techniques to data analysis, and effective use in five research areas. In this comprehensive review analysis, we proposed 10 ten rapid and accurate practices to use ML techniques in health informatics, bioinformatics, computational and systems biology, precision medicine and precision agriculture, avoid some common mistakes that we have observed several hundred times in several computational method works

    Identifying disease genes based on machine learning approaches for classification

    No full text
    In recent years, researchers have become increasingly interested in disease-gene association prediction. In the postgenomic era, this is one of the toughest jobs around. It is also challenging to determine biological research since complex disorders sometimes have very varied genotypes. Machine learning methods are used widely in the identification of crawl marks, but their images depend heavily on their quantity and quality. In crawling studies, we find that the recognition of genes reconciling diseases can be improved by an machines classifier qualified in practical gene seamlessness from gene ontology (GO). In order to predict the genes of the disease, we’ve developed a supervised machine learning system. In the proposed pipeline, the use of autism spectrum disorder (ASD) is assessed. Similarity tests from various semantics have been used to quantitatively measure similarity in gene function. In this paper we suggest various techniques for classifying data from one-hot encoding method. This experiment is complicated by the fact that the into training and test sets. This is generally called an algorithm evaluation divided-train-test split method. ASD is a disease associated with high health care costs and early intervention will significantly minimize these costs. ASD is a neurodevelopment disorder. Unfortunately, wait times are lengthy for an ASD diagnosis and treatments are not cheap. The economic effects of autism and an increase in ASD cases worldwide show an urgent need to establish methods of screening that are quickly enforced and efficient. A timely and affordable ASD screening is therefore imminent to help health practitioners and to let individuals know whether they will be formally diagnosed clinically. Classifiers qualified and validated for ASD and non-ASD genes work better than ASD classifiers previously reported. For instance, in order to predict new ASD genes, the complementary forest classification (CF) classification reached AUC 0.80 above the reported classification (0.73). Continuing, 73 novel ASD candidate bases can be predicted by the classifier function. Such genes enrich the central ASD syndrome, such as autism and compulsion

    AOPs-XGBoost: Machine learning Model for the prediction of Antioxidant Proteins properties of peptides

    No full text
    Abstract Antioxidant proteins are essential for protecting cells from free radicals. The accurate identification of antioxidant proteins via biological tests is difficult because of the high time and financial investment required. The potential of peptides produced from natural proteins is demonstrated by the fact that they are generally regarded as secure and may have additional advantageous bioactivities. Antioxidative peptides are typically discovered by analyzing numerous peptides created when a variety of proteases hydrolysis proteins. The eXtreme Gradient Boosting (XGBoost) technique was used to create a novel model for the current study, which was then compared to the most popular machine learning models. We suggested a machine-learning model that we named AOPs-XGBoost, built on sequence features and Extreme Gradient Boosting (XGBoost). We used 10-fold cross-validation testing was performed on a testing dataset using the propose. AOPs-XGBoost classifier, and the results showed a sensitivity of 67.56%, specificity of 93.87%, average accuracy of 80.72%, mean cross-validation (MCC) of 66.29%), and area under the receiver operating characteristic curve (AUC) of 88.01%. The outcomes demonstrated that the XGBoost model outperformed the other models with accuracy of 80.72% and area under the receiver operating characteristic curve of 88.01% which were better than the other models. Experimental results demonstrate that AOPs-XGBoost is a useful classifier that advances the study of antioxidant proteins

    A methodology for modelling and analysis of secure systems using security patterns and mitigation use cases

    No full text
    Many approaches for modelling security requirements have been proposed,but software industry did not reach on an agreement on how to express security requirements in a system model for software architecture. The main objective of this perspective paper is to summarize the problem space of representation of security patterns are proposed in the literature to help the developers who lack expertise in security to implement it. Applications of security patterns has been hindered by the fact that they lack directions for their implementations in a specific scenario. This paper presents a techniques for using mitigation use cases for representation solution provided by security patterns. Different challenges and issues were identified related to the application of security patterns in industry
    corecore