21 research outputs found

    Disentanglement Learning via Topology

    Full text link
    We propose TopDis (Topological Disentanglement), a method for learning disentangled representations via adding multi-scale topological loss term. Disentanglement is a crucial property of data representations substantial for the explainability and robustness of deep learning models and a step towards high-level cognition. The state-of-the-art method based on VAE minimizes the total correlation of the joint distribution of latent variables. We take a different perspective on disentanglement by analyzing topological properties of data manifolds. In particular, we optimize the topological similarity for data manifolds traversals. To the best of our knowledge, our paper is the first one to propose a differentiable topological loss for disentanglement. Our experiments have shown that the proposed topological loss improves disentanglement scores such as MIG, FactorVAE score, SAP score and DCI disentanglement score with respect to state-of-the-art results. Our method works in an unsupervised manner, permitting to apply it for problems without labeled factors of variation. Additionally, we show how to use the proposed topological loss to find disentangled directions in a trained GAN

    Learning Topology-Preserving Data Representations

    Full text link
    We propose a method for learning topology-preserving data representations (dimensionality reduction). The method aims to provide topological similarity between the data manifold and its latent representation via enforcing the similarity in topological features (clusters, loops, 2D voids, etc.) and their localization. The core of the method is the minimization of the Representation Topology Divergence (RTD) between original high-dimensional data and low-dimensional representation in latent space. RTD minimization provides closeness in topological features with strong theoretical guarantees. We develop a scheme for RTD differentiation and apply it as a loss term for the autoencoder. The proposed method "RTD-AE" better preserves the global structure and topology of the data manifold than state-of-the-art competitors as measured by linear correlation, triplet distance ranking accuracy, and Wasserstein distance between persistence barcodes

    Exploring non-linear distance metrics in the structure–activity space: QSAR models for human estrogen receptor

    No full text
    Abstract Background Quantitative structure-activity relationship (QSAR) models are important tools used in discovering new drug candidates and identifying potentially harmful environmental chemicals. These models often face two fundamental challenges: limited amount of available biological activity data and noise or uncertainty in the activity data themselves. To address these challenges, we introduce and explore a QSAR model based on custom distance metrics in the structure-activity space. Methods The model is built on top of the k-nearest neighbor model, incorporating non-linearity not only in the chemical structure space, but also in the biological activity space. The model is tuned and evaluated using activity data for human estrogen receptor from the US EPA ToxCast and Tox21 databases. Results The model closely trails the CERAPP consensus model (built on top of 48 individual human estrogen receptor activity models) in agonist activity predictions and consistently outperforms the CERAPP consensus model in antagonist activity predictions. Discussion We suggest that incorporating non-linear distance metrics may significantly improve QSAR model performance when the available biological activity data are limited

    Dynamics of Electron Transfer Pathways in Cytochrome c Oxidase

    Get PDF
    Cytochrome c oxidase mediates the final step of electron transfer reactions in the respiratory chain, catalyzing the transfer between cytochrome c and the molecular oxygen and concomitantly pumping protons across the inner mitochondrial membrane. We investigate the electron transfer reactions in cytochrome c oxidase, particularly the control of the effective electronic coupling by the nuclear thermal motion. The effective coupling is calculated using the Green's function technique with an extended Huckel level electronic Hamiltonian, combined with all-atom molecular dynamics of the protein in a native (membrane and solvent) environment. The effective coupling between Cu(A) and heme a is found to be dominated by the pathway that starts from His(B204). The coupling between heme a and heme a(3) is dominated by a through-space jump between the two heme rings rather than by covalent pathways. In the both steps, the effective electronic coupling is robust to the thermal nuclear vibrations, thereby providing fast and efficient electron transfer

    Steering Electrons on Moving Pathways

    No full text
    Electron transfer (ET) reactions provide a nexus among chemistry, biochemistry, and physics. These reactions underpin the “power plants” and “power grids” of bioenergetics, and they challenge us to understand how evolution manipulates structure to control ET kinetics. Ball-and-stick models for the machinery of electron transfer, however, fail to capture the rich electronic and nuclear dynamics of ET molecules: these static representations disguise, for example, the range of thermally accessible molecular conformations. The influence of structural fluctuations on electron-transfer kinetics is amplified by the exponential decay of electron tunneling probabilities with distance, as well as the delicate interference among coupling pathways. Fluctuations in the surrounding medium can also switch transport between coherent and incoherent ET mechanisms − and may gate ET so that its kinetics is limited by conformational interconversion times, rather than by the intrinsic ET time scale. Moreover, preparation of a charge-polarized donor state or of a donor state with linear or angular momentum can have profound dynamical and kinetic consequences. In this Account, we establish a vocabulary to describe how the conformational ensemble and the prepared donor state influence ET kinetics in macromolecules. This framework is helping to unravel the richness of functional biological ET pathways, which have evolved within fluctuating macromolecular structures. The conceptual framework for describing nonadiabatic ET seems disarmingly simple: compute the ensemble-averaged (mean-squared) donor−acceptor (DA) tunneling interaction, ⟹HDA2⟩, and the Franck−Condon weighted density of states, ρFC, to describe the rate, (2π/ℏ)⟹HDA2⟩ρFC. Modern descriptions of the thermally averaged electronic coupling and of the Franck−Condon factor establish a useful predictive framework in biology, chemistry, and nanoscience. Describing the influence of geometric and energetic fluctuations on ET allows us to address a rich array of mechanistic and kinetic puzzles. How strongly is a protein’s fold imprinted on the ET kinetics, and might thermal fluctuations “wash out” signatures of structure? What is the influence of thermal fluctuations on ET kinetics beyond averaging of the tunneling barrier structure? Do electronic coupling mechanisms change as donor and acceptor reposition in a protein, and what are the consequences for the ET kinetics? Do fluctuations access minority species that dominate tunneling? Can energy exchanges between the electron and bridge vibrations generate vibronic signatures that label some of the D-to-A pathways traversed by the electron, thus eliminating unmarked pathways that would otherwise contribute to the DA coupling (as in other “which way” or double-slit experiments)? Might medium fluctuations drive tunneling−hopping mechanistic transitions? How does the donor-state preparation, in particular, its polarization toward the acceptor and its momentum characteristics (which may introduce complex rather than pure real relationships among donor orbital amplitudes), influence the electronic dynamics? In this Account, we describe our recent studies that address puzzling questions of how conformational distributions, excited-state polarization, and electronic and nuclear dynamical effects influence ET in macromolecules. Indeed, conformational and dynamical effects arise in all transport regimes, including the tunneling, resonant transport, and hopping regimes. Importantly, these effects can induce switching among ET mechanisms

    FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology

    No full text
    In the realm of medicinal chemistry, the primary objective is to swiftly optimize a multitude of chemical properties of a set of compounds to yield a clinical candidate poised for clinical trials. In recent years, two computational techniques, machine learning (ML) and physics-based methods, have evolved substantially and are now frequently incorporated into the medicinal chemist’s toolbox to enhance the efficiency of both hit optimization and candidate design. Both computational methods come with their own set of limitations, and they are often used independently of each other. ML’s capability to screen extensive compound libraries expediently is tempered by its reliance on quality data, which can be scarce especially during early-stage optimization. Contrarily, physics-based approaches like free energy perturbation (FEP) are frequently constrained by low throughput and high cost by comparison; however, physics-based methods are capable of making highly accurate binding affinity predictions. In this study, we harnessed the strength of FEP to overcome data paucity in ML by generating virtual activity data sets which then inform the training of algorithms. Here, we show that ML algorithms trained with an FEP-augmented data set could achieve comparable predictive accuracy to data sets trained on experimental data from biological assays. Throughout the paper, we emphasize key mechanistic considerations that must be taken into account when aiming to augment data sets and lay the groundwork for successful implementation. Ultimately, the study advocates for the synergy of physics-based methods and ML to expedite the lead optimization process. We believe that the physics-based augmentation of ML will significantly benefit drug discovery, as these techniques continue to evolve

    FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology

    No full text
    In the realm of medicinal chemistry, the primary objective is to swiftly optimize a multitude of chemical properties of a set of compounds to yield a clinical candidate poised for clinical trials. In recent years, two computational techniques, machine learning (ML) and physics-based methods, have evolved substantially and are now frequently incorporated into the medicinal chemist’s toolbox to enhance the efficiency of both hit optimization and candidate design. Both computational methods come with their own set of limitations, and they are often used independently of each other. ML’s capability to screen extensive compound libraries expediently is tempered by its reliance on quality data, which can be scarce especially during early-stage optimization. Contrarily, physics-based approaches like free energy perturbation (FEP) are frequently constrained by low throughput and high cost by comparison; however, physics-based methods are capable of making highly accurate binding affinity predictions. In this study, we harnessed the strength of FEP to overcome data paucity in ML by generating virtual activity data sets which then inform the training of algorithms. Here, we show that ML algorithms trained with an FEP-augmented data set could achieve comparable predictive accuracy to data sets trained on experimental data from biological assays. Throughout the paper, we emphasize key mechanistic considerations that must be taken into account when aiming to augment data sets and lay the groundwork for successful implementation. Ultimately, the study advocates for the synergy of physics-based methods and ML to expedite the lead optimization process. We believe that the physics-based augmentation of ML will significantly benefit drug discovery, as these techniques continue to evolve

    FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology

    No full text
    In the realm of medicinal chemistry, the primary objective is to swiftly optimize a multitude of chemical properties of a set of compounds to yield a clinical candidate poised for clinical trials. In recent years, two computational techniques, machine learning (ML) and physics-based methods, have evolved substantially and are now frequently incorporated into the medicinal chemist’s toolbox to enhance the efficiency of both hit optimization and candidate design. Both computational methods come with their own set of limitations, and they are often used independently of each other. ML’s capability to screen extensive compound libraries expediently is tempered by its reliance on quality data, which can be scarce especially during early-stage optimization. Contrarily, physics-based approaches like free energy perturbation (FEP) are frequently constrained by low throughput and high cost by comparison; however, physics-based methods are capable of making highly accurate binding affinity predictions. In this study, we harnessed the strength of FEP to overcome data paucity in ML by generating virtual activity data sets which then inform the training of algorithms. Here, we show that ML algorithms trained with an FEP-augmented data set could achieve comparable predictive accuracy to data sets trained on experimental data from biological assays. Throughout the paper, we emphasize key mechanistic considerations that must be taken into account when aiming to augment data sets and lay the groundwork for successful implementation. Ultimately, the study advocates for the synergy of physics-based methods and ML to expedite the lead optimization process. We believe that the physics-based augmentation of ML will significantly benefit drug discovery, as these techniques continue to evolve

    FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology

    No full text
    In the realm of medicinal chemistry, the primary objective is to swiftly optimize a multitude of chemical properties of a set of compounds to yield a clinical candidate poised for clinical trials. In recent years, two computational techniques, machine learning (ML) and physics-based methods, have evolved substantially and are now frequently incorporated into the medicinal chemist’s toolbox to enhance the efficiency of both hit optimization and candidate design. Both computational methods come with their own set of limitations, and they are often used independently of each other. ML’s capability to screen extensive compound libraries expediently is tempered by its reliance on quality data, which can be scarce especially during early-stage optimization. Contrarily, physics-based approaches like free energy perturbation (FEP) are frequently constrained by low throughput and high cost by comparison; however, physics-based methods are capable of making highly accurate binding affinity predictions. In this study, we harnessed the strength of FEP to overcome data paucity in ML by generating virtual activity data sets which then inform the training of algorithms. Here, we show that ML algorithms trained with an FEP-augmented data set could achieve comparable predictive accuracy to data sets trained on experimental data from biological assays. Throughout the paper, we emphasize key mechanistic considerations that must be taken into account when aiming to augment data sets and lay the groundwork for successful implementation. Ultimately, the study advocates for the synergy of physics-based methods and ML to expedite the lead optimization process. We believe that the physics-based augmentation of ML will significantly benefit drug discovery, as these techniques continue to evolve
    corecore