6 research outputs found

    Classification of Prostate Cancer Patients into Indolent and Aggressive Using Machine Learning

    Get PDF
    Prostate cancer (PCa) is the second most common cancer in men in the US. Many Prostate cancers are Indolent and don’t result in cancer mortality, even without treatment. However, a significant proportion of patients with Prostate cancer have aggressive tumors that progress rapidly to metastatic disease and are often dangerous. Currently, treatment decisions for PCa patients are guided by various stratification algorithms. Among these parameters, the most important predictor of PCa mortality is the Gleason Grade (ranges from 6 to 10). Although current risk stratification tools are moderately effective, limitation remains in their ability to distinguish truly Indolent from aggressive and potentially lethal disease. Here we propose the use of Machine Learning (ML) for the classification of PC patients as having either indolent or aggressive using transcriptome data. We hypothesize that genomic alterations could lead to measurable changes distinguishing indolent from aggressive tumors. We also trained a Stacking-based model with a different set of combinations of classifiers. The highest overall accuracy of our stacking model (all samples with Gleason Grade: 6, 7, 8, 9, and 10) is 95.758% and (samples with Gleason Grade: 6, 8, 9, and 10) is 97.19%

    Prediction of DNA-Binding Proteins and their Binding Sites

    Get PDF
    DNA-binding proteins play an important role in various essential biological processes such as DNA replication, recombination, repair, gene transcription, and expression. The identification of DNA-binding proteins and the residues involved in the contacts is important for understanding the DNA-binding mechanism in proteins. Moreover, it has been reported in the literature that the mutations of some DNA-binding residues on proteins are associated with some diseases. The identification of these proteins and their binding mechanism generally require experimental techniques, which makes large scale study extremely difficult. Thus, the prediction of DNA-binding proteins and their binding sites from sequences alone is one of the most challenging problems in the field of genome annotation. Since the start of the human genome project, many attempts have been made to solve the problem with different approaches, but the accuracy of these methods is still not suitable to do large scale annotation of proteins. Rather than relying solely on the existing machine learning techniques, I sought to combine those using novel “stacking technique” and used the problem-specific architectures to solve the problem with better accuracy than the existing methods. This thesis presents a possible solution to the DNA-binding proteins prediction problem which performs better than the state-of-the-art approaches

    Prediction of DNA-Binding Proteins and their Binding Sites

    Get PDF
    DNA-binding proteins play an important role in various essential biological processes such as DNA replication, recombination, repair, gene transcription, and expression. The identification of DNA-binding proteins and the residues involved in the contacts is important for understanding the DNA-binding mechanism in proteins. Moreover, it has been reported in the literature that the mutations of some DNA-binding residues on proteins are associated with some diseases. The identification of these proteins and their binding mechanism generally require experimental techniques, which makes large scale study extremely difficult. Thus, the prediction of DNA-binding proteins and their binding sites from sequences alone is one of the most challenging problems in the field of genome annotation. Since the start of the human genome project, many attempts have been made to solve the problem with different approaches, but the accuracy of these methods is still not suitable to do large scale annotation of proteins. Rather than relying solely on the existing machine learning techniques, I sought to combine those using novel “stacking technique” and used the problem-specific architectures to solve the problem with better accuracy than the existing methods. This thesis presents a possible solution to the DNA-binding proteins prediction problem which performs better than the state-of-the-art approaches

    Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques

    Get PDF
    Identification and annotation of RNA Binding Proteins (RBPs) and RNA Binding residues from sequence information alone is one of the most challenging problems in computational biology. RBPs play crucial roles in several fundamental biological functions including transcriptional regulation of RNAs and RNA metabolism splicing. Existing experimental techniques are time-consuming and costly. Thus, efficient computational identification of RBPs directly from the sequence can be useful to annotate RBP and assist the experimental design. Here, we introduce AIRBP, a computational sequence-based method, which utilizes features extracted from evolutionary information, physiochemical properties, and disordered properties to train a machine learning method designed using stacking, an advanced machine learning technique, for effective prediction of RBPs. Furthermore, it makes use of efficient machine learning algorithms like Support Vector Machine, Logistic Regression, K-Nearest Neighbor and XGBoost (Extreme Gradient Boosting Algorithm). In this research work, we also propose another predictor for efficient annotation of RBP residues. This RBP residue predictor also uses stacking and evolutionary algorithms for efficient annotation of RBPs and RNA Binding residue. The RNA-binding residue predictor also utilizes various evolutionary, physicochemical and disordered properties to train a robust model. This thesis presents a possible solution to the RBP and RNA binding residue prediction problem through two independent predictors, both of which outperform existing state-of-the-art approaches

    Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques

    Get PDF
    Identification and annotation of RNA Binding Proteins (RBPs) and RNA Binding residues from sequence information alone is one of the most challenging problems in computational biology. RBPs play crucial roles in several fundamental biological functions including transcriptional regulation of RNAs and RNA metabolism splicing. Existing experimental techniques are time-consuming and costly. Thus, efficient computational identification of RBPs directly from the sequence can be useful to annotate RBP and assist the experimental design. Here, we introduce AIRBP, a computational sequence-based method, which utilizes features extracted from evolutionary information, physiochemical properties, and disordered properties to train a machine learning method designed using stacking, an advanced machine learning technique, for effective prediction of RBPs. Furthermore, it makes use of efficient machine learning algorithms like Support Vector Machine, Logistic Regression, K-Nearest Neighbor and XGBoost (Extreme Gradient Boosting Algorithm). In this research work, we also propose another predictor for efficient annotation of RBP residues. This RBP residue predictor also uses stacking and evolutionary algorithms for efficient annotation of RBPs and RNA Binding residue. The RNA-binding residue predictor also utilizes various evolutionary, physicochemical and disordered properties to train a robust model. This thesis presents a possible solution to the RBP and RNA binding residue prediction problem through two independent predictors, both of which outperform existing state-of-the-art approaches
    corecore