3,299 research outputs found

    A Methodological Framework to Discover Pharmacogenomic Interactions Based on Random Forests

    Get PDF
    The identification of genomic alterations in tumor tissues, including somatic mutations, deletions, and gene amplifications, produces large amounts of data, which can be correlated with a diversity of therapeutic responses. We aimed to provide a methodological framework to discover pharmacogenomic interactions based on Random Forests. We matched two databases from the Cancer Cell Line Encyclopaedia (CCLE) project, and the Genomics of Drug Sensitivity in Cancer (GDSC) project. For a total of 648 shared cell lines, we considered 48,270 gene alterations from CCLE as input features and the area under the dose-response curve (AUC) for 265 drugs from GDSC as the outcomes. A three-step reduction to 501 alterations was performed, selecting known driver genes and excluding very frequent/infrequent alterations and redundant ones. For each model, we used the concordance correlation coefficient (CCC) for assessing the predictive performance, and permutation importance for assessing the contribution of each alteration. In a reasonable computational time (56 min), we identified 12 compounds whose response was at least fairly sensitive (CCC > 20) to the alteration profiles. Some diversities were found in the sets of influential alterations, providing clues to discover significant drug-gene interactions. The proposed methodological framework can be helpful for mining pharmacogenomic interactions

    Identification of a Kinase Profile that Predicts Chromosome Damage Induced by Small Molecule Kinase Inhibitors

    Get PDF
    Kinases are heavily pursued pharmaceutical targets because of their mechanistic role in many diseases. Small molecule kinase inhibitors (SMKIs) are a compound class that includes marketed drugs and compounds in various stages of drug development. While effective, many SMKIs have been associated with toxicity including chromosomal damage. Screening for kinase-mediated toxicity as early as possible is crucial, as is a better understanding of how off-target kinase inhibition may give rise to chromosomal damage. To that end, we employed a competitive binding assay and an analytical method to predict the toxicity of SMKIs. Specifically, we developed a model based on the binding affinity of SMKIs to a panel of kinases to predict whether a compound tests positive for chromosome damage. As training data, we used the binding affinity of 113 SMKIs against a representative subset of all kinases (290 kinases), yielding a 113×290 data matrix. Additionally, these 113 SMKIs were tested for genotoxicity in an in vitro micronucleus test (MNT). Among a variety of models from our analytical toolbox, we selected using cross-validation a combination of feature selection and pattern recognition techniques: Kolmogorov-Smirnov/T-test hybrid as a univariate filter, followed by Random Forests for feature selection and Support Vector Machines (SVM) for pattern recognition. Feature selection identified 21 kinases predictive of MNT. Using the corresponding binding affinities, the SVM could accurately predict MNT results with 85% accuracy (68% sensitivity, 91% specificity). This indicates that kinase inhibition profiles are predictive of SMKI genotoxicity. While in vitro testing is required for regulatory review, our analysis identified a fast and cost-efficient method for screening out compounds earlier in drug development. Equally important, by identifying a panel of kinases predictive of genotoxicity, we provide medicinal chemists a set of kinases to avoid when designing compounds, thereby providing a basis for rational drug design away from genotoxicity

    Integrative analysis identifies candidate tumor microenvironment and intracellular signaling pathways that define tumor heterogeneity in NF1

    Get PDF
    Neurofibromatosis type 1 (NF1) is a monogenic syndrome that gives rise to numerous symptoms including cognitive impairment, skeletal abnormalities, and growth of benign nerve sheath tumors. Nearly all NF1 patients develop cutaneous neurofibromas (cNFs), which occur on the skin surface, whereas 40-60% of patients develop plexiform neurofibromas (pNFs), which are deeply embedded in the peripheral nerves. Patients with pNFs have a ~10% lifetime chance of these tumors becoming malignant peripheral nerve sheath tumors (MPNSTs). These tumors have a severe prognosis and few treatment options other than surgery. Given the lack of therapeutic options available to patients with these tumors, identification of druggable pathways or other key molecular features could aid ongoing therapeutic discovery studies. In this work, we used statistical and machine learning methods to analyze 77 NF1 tumors with genomic data to characterize key signaling pathways that distinguish these tumors and identify candidates for drug development. We identified subsets of latent gene expression variables that may be important in the identification and etiology of cNFs, pNFs, other neurofibromas, and MPNSTs. Furthermore, we characterized the association between these latent variables and genetic variants, immune deconvolution predictions, and protein activity predictions

    Prediction of peptide and protein propensity for amyloid formation

    Get PDF
    Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔGº values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation

    A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction

    Get PDF
    Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illus- trate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database

    Development of Estrogen Receptor Beta Binding Prediction Model Using Large Sets of Chemicals

    Get PDF
    We developed an ERβ binding prediction model to facilitate identification of chemicals specifically bind ERβ or ERα together with our previously developed ERα binding model. Decision Forest was used to train ERβ binding prediction model based on a large set of compounds obtained from EADB. Model performance was estimated through 1000 iterations of 5-fold cross validations. Prediction confidence was analyzed using predictions from the cross validations. Informative chemical features for ERβ binding were identified through analysis of the frequency data of chemical descriptors used in the models in the 5-fold cross validations. 1000 permutations were conducted to assess the chance correlation. The average accuracy of 5-fold cross validations was 93.14% with a standard deviation of 0.64%. Prediction confidence analysis indicated that the higher the prediction confidence the more accurate the predictions. Permutation testing results revealed that the prediction model is unlikely generated by chance. Eighteen informative descriptors were identified to be important to ERβ binding prediction. Application of the prediction model to the data from ToxCast project yielded very high sensitivity of 90-92%. Our results demonstrated ERβ binding of chemicals could be accurately predicted using the developed model. Coupling with our previously developed ERα prediction model, this model could be expected to facilitate drug development through identification of chemicals that specifically bind ERβ or ERα

    More than just hormones: H295R cells as predictors of reproductive toxicity

    Get PDF
    AbstractMany of the commonly observed reproductive toxicities associated with therapeutic compounds can be traced to a disruption of the steroidogenic pathway. We sought to develop an in vitro assay that would predict reproductive toxicity and be high throughput in nature. H295R cells, previously validated as having an intact and functional steroidogenic pathway, were treated with 83 known-positive and 79 known-negative proprietary and public-domain compounds. The assay measured the expression of the key enzymes STAR, 3βHSD2, CYP17A1, CYP11B2, CYP19A1, CYP21A2, and CYP11A1 and the hormones DHEA, progesterone, testosterone, and cortisol. We found that a Random Forest model yielded a receiver operating characteristic area under the curve (ROC AUC) of 0.845, with sensitivity of 0.724 and specificity of 0.758 for predicting in vivo reproductive toxicity with this in vitro assay system

    Prediction of drug-drug interaction potential using machine learning approaches

    Get PDF
    Drug discovery is a long, expensive, and complex, yet crucial process for the benefit of society. Selecting potential drug candidates requires an understanding of how well a compound will perform at its task, and more importantly, how safe the compound will act in patients. A key safety insight is understanding a molecule\u27s potential for drug-drug interactions. The metabolism of many drugs is mediated by members of the cytochrome P450 superfamily, notably, the CYP3A4 enzyme. Inhibition of these enzymes can alter the bioavailability of other drugs, potentially increasing their levels to toxic amounts. Four models were developed to predict CYP3A4 inhibition: logistic regression, random forests, support vector machine, and neural network. Two novel convolutional approaches were explored for data featurization: SMILES string auto-extraction and 2D structure auto-extraction. The logistic regression model achieved an accuracy of 83.2%, the random forests model, 83.4%, the support vector machine model, 81.9%, and the neural network model, 82.3%. Additionally, the model built with SMILE string auto-extraction had an accuracy of 82.3%, and the model with 2D structure auto-extraction, 76.4%. The advantages of the novel featurization methods are their ability to learn relevant features from compound SMILE strings, eliminating feature engineering. The developed methodologies can be extended towards predicting any structure-activity relationship and fitted for other areas of drug discovery and development
    • …
    corecore