14 research outputs found

    Leveraging protein quaternary structure to identify oncogenic driver mutations.

    Get PDF
    BACKGROUND: Identifying key "driver" mutations which are responsible for tumorigenesis is critical in the development of new oncology drugs. Due to multiple pharmacological successes in treating cancers that are caused by such driver mutations, a large body of methods have been developed to differentiate these mutations from the benign "passenger" mutations which occur in the tumor but do not further progress the disease. Under the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of algorithms that identify these clusters has become a critical area of research. RESULTS: We have developed a novel methodology, QuartPAC (Quaternary Protein Amino acid Clustering), that identifies non-random mutational clustering while utilizing the protein quaternary structure in 3D space. By integrating the spatial information in the Protein Data Bank (PDB) and the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), QuartPAC is able to identify clusters which are otherwise missed in a variety of proteins. The R package is available on Bioconductor at: http://bioconductor.jp/packages/3.1/bioc/html/QuartPAC.html . CONCLUSION: QuartPAC provides a unique tool to identify mutational clustering while accounting for the complete folded protein quaternary structure.This work was supported in part by NSF Grant DMS 1106738 (GR, HZ); NIH Grants GM59507 and CA154295 (HZ), and GM102869 (YM); and Wellcome Trust Grant 101908/Z/13/Z (YM)

    From a Large Language Model to Three-Dimensional Sentiment

    No full text
    We present an automated model of sentiment which assigns to any arbitrary text three values: valence, arousal, and confidence (VAC) and is based on the three-dimensional framework of emotion introduced by Mehrabian and Russell [1]. While such a model is potentially valuable in any situation where a nuanced measurement of sentiment is important, our motivation was to quantify the dialog from psychological therapy and support sessions. The VAC scores are real-values that lie between -1 and 1, and thus the output lies in the 2Ă—2Ă—2 cube defined by the three dimensions. Internally, the model uses a convex combination of points in the cube with weights that are obtained from a publicly available zero-shot classifier built from the BART large language model (LLM) that has been fine-tuned on the Multi-Genre Natural Language Inference (MNLI) dataset. The classes used to produce weights for the convex combination were obtained through prompt engineering. In addition to describing the model and our approach to defining the classes, we show that the VAC model scores are strongly correlated with scores provided by human raters on individual words, and is arguably better than human on the dimension of confidence. By leveraging an LLM, it can process text of any size, is sensitive to subtlety and idiom, can be updated as language and technology evolve, and can produce meaningful results for any arbitrary sentence-length inputs. To illustrate a real-world application, we use the model to evaluate sentences spoken during a psychological therapy session

    Computational predictions of the site of metabolism of cytochrome P450 2D6 substrates: comparative analysis, molecular docking, bioactivation and toxicological implications

    No full text
    <div><p></p><p>Cytochrome P450 2D6 (CYP2D6) is a polymorphic enzyme responsible for metabolizing approximately 25% of all drugs. CYP2D6 is highly expressed in the brain and plays a role as the major CYP in the metabolism of numerous brain-penetrant drugs, including antipsychotics and antidepressants. CYP2D6 activity and inhibition have been associated with numerous undesirable effects in patients, such as bioactivation, drug-associated suicidality and prolongation of the QTc interval. Several <i>in silico</i> tools have been developed in recent years to assist safety assessment scientists in predicting the structural identity of CYP2D6-derived metabolites. The first goal of this study was to perform a comparative evaluation on the ability of four commonly used <i>in silico</i> tools (MetaSite, StarDrop, SMARTCyp and RS-WebPredictor) to correctly predict the CYP2D6-derived site of metabolism (SOM) for 141 compounds, including 10 derived from the Genentech small molecule library. The second goal was to evaluate if a bioactivation prediction model, based on an indicator of chemical reactivity (E<sub>LUMO</sub>–E<sub>HOMO</sub>) and electrostatic potential, could correctly predict five representative compounds known to be bioactivated by CYP2D6. Such a model would be of great utility in safety assessment since unforeseen toxicities of CYP2D6 substrates may in part be due to bioactivation mechanisms. The third and final goal was to investigate whether molecular docking, using the crystal structure of human CYP2D6, had the potential to compliment or improve the results obtained from the four SOM <i>in silico</i> programs.</p></div

    Comparative evaluation of 11 <i>in silico</i> models for the prediction of small molecule mutagenicity: role of steric hindrance and electron-withdrawing groups

    No full text
    <p>The goal of this investigation was to perform a comparative analysis on how accurately 11 routinely-used <i>in silico</i> programs correctly predicted the mutagenicity of test compounds that contained either bulky or electron-withdrawing substituents. To our knowledge this is the first study of its kind in the literature. Such substituents are common in many pharmaceutical agents so there is a significant need for reliable <i>in silico</i> programs to predict precisely whether they truly pose a risk for mutagenicity. The predictions from each program were compared to experimental data derived from the Ames II test, a rapid reverse mutagenicity assay with a high degree of agreement with the traditional Ames assay. Eleven <i>in silico</i> programs were evaluated and compared: Derek for Windows, Derek Nexus, Leadscope Model Applier (LSMA), LSMA featuring the <i>in vitro</i> microbial <i>Escherichia coli–Salmonella typhimurium</i> TA102 A-T Suite (LSMA+), TOPKAT, CAESAR, TEST, ChemSilico (±S9 suites), MC4PC and a novel DNA docking model. The presence of bulky or electron-withdrawing functional groups in the vicinity of a mutagenic toxicophore in the test compounds clearly affected the ability of each <i>in silico</i> model to predict non-mutagenicity correctly. This was because of an over reliance on the part of the programs to provide mutagenicity alerts when a particular toxicophore is present irrespective of the structural environment surrounding the toxicophore. From this investigation it can be concluded that these models provide a high degree of specificity (ranging from 71% to 100%) and are generally conservative in their predictions in terms of sensitivity (ranging from 5% t o 78%). These values are in general agreement with most other comparative studies in the literature. Interestingly, the DNA docking model was the most sensitive model evaluated, suggesting a potentially useful new mode of screening for mutagens. Another important finding was that the combination of a quantitative structure–activity relationship and an expert rules system appeared to offer little advantage in terms of sensitivity, despite of the requirement for such a screening paradigm under the ICH M7 regulatory guideline.</p

    Psilocybin Therapy for Treatment Resistant Depression: Prediction of Clinical Outcome by Natural Language Processing

    No full text
    Background: Therapeutic administration of psychedelic drugs has shown significant potential in historical accounts and in recent clinical trials in the treatment of depression and other mood disorders. A recent randomized double-blind phase-IIb study demonstrated the safety and efficacy of COMP360, COMPASS Pathways’ proprietary synthetic formulation of psilocybin, in participants with treatment resistant depression. While promising, the treatment works for a portion of the population and early prediction of outcome is a key objective. Methods: Transcripts were made from audio recordings of the psychological support session between participant and therapist one day post COMP360 administration. A zero-shot machine learning classifier based on the BART large language model was used to compute two-dimensional sentiment (valence and arousal) for the participant and therapist from the transcript. These scores, combined with the Emotional Breakthrough Index (EBI) and treatment arm were used to predict treatment outcome as measured by MADRS scores. Code and data are available at https://github.com/compasspathways/Sentiment2D Results: Two multinomial logistic regression models were fit to predict responder status at week 3 and through week 12. Cross-validation of these models resulted in 85% and 88% accuracy and AUC values of 88% and 85%. Conclusions: A machine learning algorithm using NLP and EBI accurately predicts long term patient response, allowing rapid prognostication of personalized response to psilocybin treatment and insight into therapeutic model optimization. Further research is required to understand if language data from earlier stages in the therapeutic process hold similar predictive power

    Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes

    No full text
    Cancers exhibit extensive mutational heterogeneity, and the resulting long-tail phenomenon complicates the discovery of genes and pathways that are significantly mutated in cancer. We perform a pan-cancer analysis of mutated networks in 3,281 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a new algorithm to find mutated subnetworks that overcomes the limitations of existing single-gene, pathway and network approaches. We identify 16 significantly mutated subnetworks that comprise well-known cancer signaling pathways as well as subnetworks with less characterized roles in cancer, including cohesin, condensin and others. Many of these subnetworks exhibit co-occurring mutations across samples. These subnetworks contain dozens of genes with rare somatic mutations across multiple cancers; many of these genes have additional evidence supporting a role in cancer. By illuminating these rare combinations of mutations, pan-cancer network analyses provide a roadmap to investigate new diagnostic and therapeutic opportunities across cancer types.This work is supported by US National Science Foundation (NSF) grant IIS-1016648 and US National Institutes of Health (NIH) grants R01HG005690, R01HG007069 and R01CA180776 to B.J.R. and by National Human Genome Research Institute (NHGRI) grant U01HG006517 to L.D. B.J.R. is supported by a Career Award at the Scientific Interface from the Burroughs Wellcome Fund, an Alfred P. Sloan Research Fellowship and an NSF CAREER Award (CCF-1053753). M.D.M.L. is supported by NSF fellowship GRFP DGE 022824
    corecore