6 research outputs found

    Lytic Water Dynamics Reveal Evolutionarily Conserved Mechanisms of ATP Hydrolysis by TIP49 AAA+ ATPases

    Get PDF
    SummaryEukaryotic TIP49a (Pontin) and TIP49b (Reptin) AAA+ ATPases play essential roles in key cellular processes. How their weak ATPase activity contributes to their important functions remains largely unknown and difficult to analyze because of the divergent properties of TIP49a and TIP49b proteins and of their homo- and hetero-oligomeric assemblies. To circumvent these complexities, we have analyzed the single ancient TIP49 ortholog found in the archaeon Methanopyrus kandleri (mkTIP49). All-atom homology modeling and molecular dynamics simulations validated by biochemical assays reveal highly conserved organizational principles and identify key residues for ATP hydrolysis. An unanticipated crosstalk between Walker B and Sensor I motifs impacts the dynamics of water molecules and highlights a critical role of trans-acting aspartates in the lytic water activation step that is essential for the associative mechanism of ATP hydrolysis

    Conformal efficiency as a metric for comparative model assessment befitting federated learning

    No full text
    As training volume increases predictive model quality, leveraging existing external data sources holds the promise of time- and cost-efficiency. In a drug discovery setting, pharmaceutical companies all own substantial but confidential datasets. The MELLODDY project develops a privacy-preserving federated machine learning solution and deploys it at an unprecedented scale (more than 100,000 tasks across ten major pharmaceutical companies), while ensuring the security and privacy of each partner’s sensitive data. Each partner builds models that benefit from a shared representation, for their own private assays. Established predictive performance metrics such as AUC ROC or AUC PR are constrained to unseen labelled chemical space. However, they cannot gauge performance gains in unlabelled chemical space. Federated learning indirectly extends labelled space, but in a privacy-preserving context, a partner cannot use this label extension for performance assessment. Metrics that estimate uncertainty on a prediction can be calculated even where no label is known. Practically, the chemical space covered with predictions of sufficient confidence, reflects the applicability domain of a model. After establishing a link to established performance metrics, we propose the efficiency from the conformal prediction framework (‘conformal efficiency’) as a proxy to the applicability domain size. A documented extension of the applicability domain would qualify as a tangible benefit from federated learning. In interim assessments, MELLODDY partners report a median increase in conformal efficiency of the federated over the single-partner model of 5.5% (with increases up to 9.7%). Subject to distributional conditions, that efficiency increase can be directly interpreted as the expected increase in conformal i.e. high confidence predictions. In conclusion, we present the first evidence that privacy-preserving federated machine learning across massive drug-discovery datasets from ten pharma partners indeed extends the applicability domain of property prediction models

    Conformal efficiency as a metric for comparative model assessment befitting federated learning

    No full text
    In a drug discovery setting, pharmaceutical companies own substantial but confidential datasets. The MELLODDY project developed a privacy-preserving federated machine learning solution and deployed it at an unprecedented scale. Each partner built models for their own private assays that benefitted from a shared representation. Established predictive performance metrics such as AUC ROC or AUC PR are constrained to unseen labeled chemical space and cannot gage performance gains in unlabeled chemical space. Federated learning indirectly extends labeled space, but in a privacy-preserving context, a partner cannot use this label extension for performance assessment. Metrics that estimate uncertainty on a prediction can be calculated even where no label is known. Practically, the chemical space covered with predictions above an uncertainty threshold, reflects the applicability domain of a model. After establishing a link to established performance metrics, we propose the efficiency from the conformal prediction framework (‘conformal efficiency’) as a proxy to the applicability domain size. A documented extension of the applicability domain would qualify as a tangible benefit from federated learning. In interim assessments, MELLODDY partners reported a median increase in conformal efficiency of the federated over the single-partner model of 5.5% (with increases up to 9.7%). Subject to distributional conditions, that efficiency increase can be directly interpreted as the expected increase in conformal i.e. low uncertainty predictions. In conclusion, we present the first indication that privacy-preserving federated machine learning across massive drug-discovery datasets from ten pharma partners indeed extends the applicability domain of property prediction models

    MELLODDY: cross pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information

    No full text
    Federated multi-partner machine learning can be an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource intensive. In the landmark MELLODDY project, each of ten pharmaceutical companies realized aggregated improvements on its own classification and/or regression models through federated learning. To this end, they leveraged a novel implementation extending multi-task learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma dataset of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point towards an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performances, albeit with saturating return. Markedly higher improvements were observed for pharmacokinetics and safety panel assay-based task subsets

    MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information.

    No full text
    Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets
    corecore