21 research outputs found
Disentanglement Learning via Topology
We propose TopDis (Topological Disentanglement), a method for learning
disentangled representations via adding multi-scale topological loss term.
Disentanglement is a crucial property of data representations substantial for
the explainability and robustness of deep learning models and a step towards
high-level cognition. The state-of-the-art method based on VAE minimizes the
total correlation of the joint distribution of latent variables. We take a
different perspective on disentanglement by analyzing topological properties of
data manifolds. In particular, we optimize the topological similarity for data
manifolds traversals. To the best of our knowledge, our paper is the first one
to propose a differentiable topological loss for disentanglement. Our
experiments have shown that the proposed topological loss improves
disentanglement scores such as MIG, FactorVAE score, SAP score and DCI
disentanglement score with respect to state-of-the-art results. Our method
works in an unsupervised manner, permitting to apply it for problems without
labeled factors of variation. Additionally, we show how to use the proposed
topological loss to find disentangled directions in a trained GAN
Learning Topology-Preserving Data Representations
We propose a method for learning topology-preserving data representations
(dimensionality reduction). The method aims to provide topological similarity
between the data manifold and its latent representation via enforcing the
similarity in topological features (clusters, loops, 2D voids, etc.) and their
localization. The core of the method is the minimization of the Representation
Topology Divergence (RTD) between original high-dimensional data and
low-dimensional representation in latent space. RTD minimization provides
closeness in topological features with strong theoretical guarantees. We
develop a scheme for RTD differentiation and apply it as a loss term for the
autoencoder. The proposed method "RTD-AE" better preserves the global structure
and topology of the data manifold than state-of-the-art competitors as measured
by linear correlation, triplet distance ranking accuracy, and Wasserstein
distance between persistence barcodes
Exploring non-linear distance metrics in the structureâactivity space: QSAR models for human estrogen receptor
Abstract Background Quantitative structure-activity relationship (QSAR) models are important tools used in discovering new drug candidates and identifying potentially harmful environmental chemicals. These models often face two fundamental challenges: limited amount of available biological activity data and noise or uncertainty in the activity data themselves. To address these challenges, we introduce and explore a QSAR model based on custom distance metrics in the structure-activity space. Methods The model is built on top of the k-nearest neighbor model, incorporating non-linearity not only in the chemical structure space, but also in the biological activity space. The model is tuned and evaluated using activity data for human estrogen receptor from the US EPA ToxCast and Tox21 databases. Results The model closely trails the CERAPP consensus model (built on top of 48 individual human estrogen receptor activity models) in agonist activity predictions and consistently outperforms the CERAPP consensus model in antagonist activity predictions. Discussion We suggest that incorporating non-linear distance metrics may significantly improve QSAR model performance when the available biological activity data are limited
Dynamics of Electron Transfer Pathways in Cytochrome c Oxidase
Cytochrome c oxidase mediates the final step of electron transfer reactions in the respiratory chain, catalyzing the transfer between cytochrome c and the molecular oxygen and concomitantly pumping protons across the inner mitochondrial membrane. We investigate the electron transfer reactions in cytochrome c oxidase, particularly the control of the effective electronic coupling by the nuclear thermal motion. The effective coupling is calculated using the Green's function technique with an extended Huckel level electronic Hamiltonian, combined with all-atom molecular dynamics of the protein in a native (membrane and solvent) environment. The effective coupling between Cu(A) and heme a is found to be dominated by the pathway that starts from His(B204). The coupling between heme a and heme a(3) is dominated by a through-space jump between the two heme rings rather than by covalent pathways. In the both steps, the effective electronic coupling is robust to the thermal nuclear vibrations, thereby providing fast and efficient electron transfer
Steering Electrons on Moving Pathways
Electron transfer (ET) reactions provide a nexus among chemistry, biochemistry, and physics. These reactions underpin the âpower plantsâ and âpower gridsâ of bioenergetics, and they challenge us to understand how evolution manipulates structure to control ET kinetics. Ball-and-stick models for the machinery of electron transfer, however, fail to capture the rich electronic and nuclear dynamics of ET molecules: these static representations disguise, for example, the range of thermally accessible molecular conformations. The influence of structural fluctuations on electron-transfer kinetics is amplified by the exponential decay of electron tunneling probabilities with distance, as well as the delicate interference among coupling pathways. Fluctuations in the surrounding medium can also switch transport between coherent and incoherent ET mechanisms â and may gate ET so that its kinetics is limited by conformational interconversion times, rather than by the intrinsic ET time scale. Moreover, preparation of a charge-polarized donor state or of a donor state with linear or angular momentum can have profound dynamical and kinetic consequences. In this Account, we establish a vocabulary to describe how the conformational ensemble and the prepared donor state influence ET kinetics in macromolecules. This framework is helping to unravel the richness of functional biological ET pathways, which have evolved within fluctuating macromolecular structures.
The conceptual framework for describing nonadiabatic ET seems disarmingly simple: compute the ensemble-averaged (mean-squared) donorâacceptor (DA) tunneling interaction, âšHDA2â©, and the FranckâCondon weighted density of states, ÏFC, to describe the rate, (2Ï/â)âšHDA2â©ÏFC. Modern descriptions of the thermally averaged electronic coupling and of the FranckâCondon factor establish a useful predictive framework in biology, chemistry, and nanoscience. Describing the influence of geometric and energetic fluctuations on ET allows us to address a rich array of mechanistic and kinetic puzzles. How strongly is a proteinâs fold imprinted on the ET kinetics, and might thermal fluctuations âwash outâ signatures of structure? What is the influence of thermal fluctuations on ET kinetics beyond averaging of the tunneling barrier structure? Do electronic coupling mechanisms change as donor and acceptor reposition in a protein, and what are the consequences for the ET kinetics? Do fluctuations access minority species that dominate tunneling? Can energy exchanges between the electron and bridge vibrations generate vibronic signatures that label some of the D-to-A pathways traversed by the electron, thus eliminating unmarked pathways that would otherwise contribute to the DA coupling (as in other âwhich wayâ or double-slit experiments)? Might medium fluctuations drive tunnelingâhopping mechanistic transitions? How does the donor-state preparation, in particular, its polarization toward the acceptor and its momentum characteristics (which may introduce complex rather than pure real relationships among donor orbital amplitudes), influence the electronic dynamics?
In this Account, we describe our recent studies that address puzzling questions of how conformational distributions, excited-state polarization, and electronic and nuclear dynamical effects influence ET in macromolecules. Indeed, conformational and dynamical effects arise in all transport regimes, including the tunneling, resonant transport, and hopping regimes. Importantly, these effects can induce switching among ET mechanisms
FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology
In the realm of medicinal
chemistry, the primary objective is to
swiftly optimize a multitude of chemical properties of a set of compounds
to yield a clinical candidate poised for clinical trials. In recent
years, two computational techniques, machine learning (ML) and physics-based
methods, have evolved substantially and are now frequently incorporated
into the medicinal chemistâs toolbox to enhance the efficiency
of both hit optimization and candidate design. Both computational
methods come with their own set of limitations, and they are often
used independently of each other. MLâs capability to screen
extensive compound libraries expediently is tempered by its reliance
on quality data, which can be scarce especially during early-stage
optimization. Contrarily, physics-based approaches like free energy
perturbation (FEP) are frequently constrained by low throughput and
high cost by comparison; however, physics-based methods are capable
of making highly accurate binding affinity predictions. In this study,
we harnessed the strength of FEP to overcome data paucity in ML by
generating virtual activity data sets which then inform the training
of algorithms. Here, we show that ML algorithms trained with an FEP-augmented
data set could achieve comparable predictive accuracy to data sets
trained on experimental data from biological assays. Throughout the
paper, we emphasize key mechanistic considerations that must be taken
into account when aiming to augment data sets and lay the groundwork
for successful implementation. Ultimately, the study advocates for
the synergy of physics-based methods and ML to expedite the lead optimization
process. We believe that the physics-based augmentation of ML will
significantly benefit drug discovery, as these techniques continue
to evolve
FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology
In the realm of medicinal
chemistry, the primary objective is to
swiftly optimize a multitude of chemical properties of a set of compounds
to yield a clinical candidate poised for clinical trials. In recent
years, two computational techniques, machine learning (ML) and physics-based
methods, have evolved substantially and are now frequently incorporated
into the medicinal chemistâs toolbox to enhance the efficiency
of both hit optimization and candidate design. Both computational
methods come with their own set of limitations, and they are often
used independently of each other. MLâs capability to screen
extensive compound libraries expediently is tempered by its reliance
on quality data, which can be scarce especially during early-stage
optimization. Contrarily, physics-based approaches like free energy
perturbation (FEP) are frequently constrained by low throughput and
high cost by comparison; however, physics-based methods are capable
of making highly accurate binding affinity predictions. In this study,
we harnessed the strength of FEP to overcome data paucity in ML by
generating virtual activity data sets which then inform the training
of algorithms. Here, we show that ML algorithms trained with an FEP-augmented
data set could achieve comparable predictive accuracy to data sets
trained on experimental data from biological assays. Throughout the
paper, we emphasize key mechanistic considerations that must be taken
into account when aiming to augment data sets and lay the groundwork
for successful implementation. Ultimately, the study advocates for
the synergy of physics-based methods and ML to expedite the lead optimization
process. We believe that the physics-based augmentation of ML will
significantly benefit drug discovery, as these techniques continue
to evolve
FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology
In the realm of medicinal
chemistry, the primary objective is to
swiftly optimize a multitude of chemical properties of a set of compounds
to yield a clinical candidate poised for clinical trials. In recent
years, two computational techniques, machine learning (ML) and physics-based
methods, have evolved substantially and are now frequently incorporated
into the medicinal chemistâs toolbox to enhance the efficiency
of both hit optimization and candidate design. Both computational
methods come with their own set of limitations, and they are often
used independently of each other. MLâs capability to screen
extensive compound libraries expediently is tempered by its reliance
on quality data, which can be scarce especially during early-stage
optimization. Contrarily, physics-based approaches like free energy
perturbation (FEP) are frequently constrained by low throughput and
high cost by comparison; however, physics-based methods are capable
of making highly accurate binding affinity predictions. In this study,
we harnessed the strength of FEP to overcome data paucity in ML by
generating virtual activity data sets which then inform the training
of algorithms. Here, we show that ML algorithms trained with an FEP-augmented
data set could achieve comparable predictive accuracy to data sets
trained on experimental data from biological assays. Throughout the
paper, we emphasize key mechanistic considerations that must be taken
into account when aiming to augment data sets and lay the groundwork
for successful implementation. Ultimately, the study advocates for
the synergy of physics-based methods and ML to expedite the lead optimization
process. We believe that the physics-based augmentation of ML will
significantly benefit drug discovery, as these techniques continue
to evolve
FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology
In the realm of medicinal
chemistry, the primary objective is to
swiftly optimize a multitude of chemical properties of a set of compounds
to yield a clinical candidate poised for clinical trials. In recent
years, two computational techniques, machine learning (ML) and physics-based
methods, have evolved substantially and are now frequently incorporated
into the medicinal chemistâs toolbox to enhance the efficiency
of both hit optimization and candidate design. Both computational
methods come with their own set of limitations, and they are often
used independently of each other. MLâs capability to screen
extensive compound libraries expediently is tempered by its reliance
on quality data, which can be scarce especially during early-stage
optimization. Contrarily, physics-based approaches like free energy
perturbation (FEP) are frequently constrained by low throughput and
high cost by comparison; however, physics-based methods are capable
of making highly accurate binding affinity predictions. In this study,
we harnessed the strength of FEP to overcome data paucity in ML by
generating virtual activity data sets which then inform the training
of algorithms. Here, we show that ML algorithms trained with an FEP-augmented
data set could achieve comparable predictive accuracy to data sets
trained on experimental data from biological assays. Throughout the
paper, we emphasize key mechanistic considerations that must be taken
into account when aiming to augment data sets and lay the groundwork
for successful implementation. Ultimately, the study advocates for
the synergy of physics-based methods and ML to expedite the lead optimization
process. We believe that the physics-based augmentation of ML will
significantly benefit drug discovery, as these techniques continue
to evolve