7 research outputs found

    DDGun: an untrained predictor of protein stability changes upon amino acid variants

    Get PDF
    Estimating the functional effect of single amino acid variants in proteins is fundamental for predicting the change in the thermodynamic stability, measured as the difference in the Gibbs free energy of unfolding, between the wild-type and the variant protein (ΔΔG). Here, we present the web-server of the DDGun method, which was previously developed for the ΔΔG prediction upon amino acid variants. DDGun is an untrained method based on basic features derived from evolutionary information. It is antisymmetric, as it predicts opposite ΔΔG values for direct (A → B) and reverse (B → A) single and multiple site variants. DDGun is available in two versions, one based on only sequence information and the other one based on sequence and structure information. Despite being untrained, DDGun reaches prediction performances comparable to those of trained methods. Here we make DDGun available as a web server. For the web server version, we updated the protein sequence database used for the computation of the evolutionary features, and we compiled two new data sets of protein variants to do a blind test of its performances. On these blind data sets of single and multiple site variants, DDGun confirms its prediction performance, reaching an average correlation coefficient between experimental and predicted ΔΔG of 0.45 and 0.49 for the sequence-based and structure-based versions, respectively. Besides being used for the prediction of ΔΔG, we suggest that DDGun should be adopted as a benchmark method to assess the predictive capabilities of newly developed methods. Releasing DDGun as a web-server, stand-alone program and docker image will facilitate the necessary process of method comparison to improve ΔΔG prediction

    Deep learning methods to predict amyotrophic lateral sclerosis disease progression

    Get PDF
    Amyotrophic lateral sclerosis (ALS) is a highly complex and heterogeneous neurodegenerative disease that affects motor neurons. Since life expectancy is relatively low, it is essential to promptly understand the course of the disease to better target the patient’s treatment. Predictive models for disease progression are thus of great interest. One of the most extensive and well-studied open-access data resources for ALS is the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) repository. In 2015, the DREAM-Phil Bowen ALS Prediction Prize4Life Challenge was held on PRO-ACT data, where competitors were asked to develop machine learning algorithms to predict disease progression measured through the slope of the ALSFRS score between 3 and 12 months. However, although it has already been successfully applied in several studies on ALS patients, to the best of our knowledge deep learning approaches still remain unexplored on the ALSFRS slope prediction in PRO-ACT cohort. Here, we investigate how deep learning models perform in predicting ALS progression using the PRO-ACT data. We developed three models based on different architectures that showed comparable or better performance with respect to the state-of-the-art models, thus representing a valid alternative to predict ALS disease progression

    Il ruolo della sequenza amminoacidica nella separazione di fase di proteine

    Get PDF
    Argomento centrale del seguente elaborato è la fisica delle proteine. Verranno presentati alcuni modelli atti alla descrizione dei meccanismi e delle interazioni presenti tra le catene amminoacidiche. Dopo una breve introduzione alle proteine in generale, seguendo le orme di un articolo pubblicato nel 2017 sul Journal Molecular of Liquids [4], tratteremo un particolare tipo di proteina, la Ddx4^{N1}. Questa rientra nella più generale categoria delle proteine intrinsecamente disordinate (IDPs), cioè proteine che sono costituite principalmente da amminoacidi polari e carichi in cui non vi è una struttura tridimensionale ben definita e stabile. Vedremo come la teoria classica di Flory-Huggins non è sufficiente a descrivere il comportamento della nostra proteina; infatti, sebbene questo modello descriva molto bene il comportamento di proteine standard, sarà necessario ampliarlo per computare le interazioni coulombiane presenti tra i residui carichi. Tramite la teoria RPA studieremo la proteina Ddx4^{N1} originale e la sua proteina mutante, Ddx4_{CS} (charge scrambled), mettendo in evidenza come la separazione di fase in soluzione acquosa dipenda fortemente dalla disposizione degli amminoacidi all'interno della sequenza. La Ddx4, come altre IDPs, è in grado di effettuare una separazione di fase liquido-liquido in cui passa da uno stato solubile ad uno insolubile in acqua, processo attraverso il quale si formano delle goccioline che possono funzionare come componenti di organelli privi di membrana (tra cui nucleoli, corpi di Cajal e granuli di stress). Questi organelli svolgono ruoli importanti nella regolazione genica, nel processo di omeostasi e nel ciclo cellulare. Pertanto comprendere la biofisica della separazione di fase di IDPs ha rilevanza, oltre che prettamente biologica, anche di tipo medico. Obiettivo principale del lavoro di tesi sarà la riproduzione dei diagrammi di fase presenti nell' articolo di riferimento

    DataSheet1_Unravelling the instability of mutational signatures extraction via archetypal analysis.PDF

    No full text
    The high cosine similarity between some single-base substitution mutational signatures and their characteristic flat profiles could suggest the presence of overfitting and mathematical artefacts. The newest version (v3.3) of the signature database available in the Catalogue Of Somatic Mutations In Cancer (COSMIC) provides a collection of 79 mutational signatures, which has more than doubled with respect to previous version (30 profiles available in COSMIC signatures v2), making more critical the associations between signatures and specific mutagenic processes. This study both provides a systematic assessment of the de novo extraction task through simulation scenarios based on the latest version of the COSMIC signatures and highlights, through a novel approach using archetypal analysis, which COSMIC signatures are redundant and more likely to be considered as mathematical artefacts. 29 archetypes were able to reconstruct the profile of all the COSMIC signatures with cosine similarity >0.8. Interestingly, these archetypes tend to group similar original signatures sharing either the same aetiology or similar biological processes. We believe that these findings will be useful to encourage the development of new de novo extraction methods avoiding the redundancy of information among the signatures while preserving the biological interpretation.</p

    Machine-learning based prediction of in-hospital death for patients with takotsubo syndrome: the InterTAK-ML model

    Get PDF
    Aims: Takotsubo syndrome (TTS) is associated with a substantial rate of adverse events. We sought to design a machine-learning (ML) based model to predict the risk of in-hospital death and to perform a clustering of TTS patients to identify different risk profiles. Methods and results: A Ridge Logistic Regression-based ML model for predicting in-hospital death was developed on 3482 TTS patients from the International Takotsubo Registry, randomly split in a train and an internal validation cohort (75% and 25% of the sample size, respectively) and evaluated in an external validation cohort (1037 patients). 31 clinically relevant variables were included in the prediction model. Model performance represented the primary endpoint and was assessed according to area under the receiver-operating characteristic curve (AUC), Sensitivity and Specificity. As secondary endpoint, a K-Medoids clustering algorithm was designed to stratify patients into phenotypic groups based on the ten most relevant features emerging from the main model. The overall incidence of in-hospital death was 5.2%. The InterTAK-ML model showed an AUC of 0.89 (0.85-0.92), Sensitivity 0.85 (0.78-0.95) and Specificity 0.76 (0.74-0.79) in the internal validation cohort and an AUC of 0.82 (0.73-0.91), a sensitivity of 0.74 (0.61-0.87) and a specificity of 0.79 (0.77-0.81) in the external cohort for in-hospital death prediction. By exploiting the 10 variables showing the highest feature importance, TTS patients were clustered into six groups associated with different risks of in-hospital death (28.8% vs 15.5% vs 5.4% vs 0.8% vs 0.5%) which were consistent also in the external cohort. Conclusion: A ML-based approach for the identification of TTS patients at risk of adverse short-term prognosis is feasible and effective. The InterTAK-ML model showed unprecedented discriminative capability for the prediction of in-hospital death. This article is protected by copyright. All rights reserved
    corecore