34 research outputs found

    A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining

    Full text link
    Molecule pretraining has quickly become the go-to schema to boost the performance of AI-based drug discovery. Naturally, molecules can be represented as 2D topological graphs or 3D geometric point clouds. Although most existing pertaining methods focus on merely the single modality, recent research has shown that maximizing the mutual information (MI) between such two modalities enhances the molecule representation ability. Meanwhile, existing molecule multi-modal pretraining approaches approximate MI based on the representation space encoded from the topology and geometry, thus resulting in the loss of critical structural information of molecules. To address this issue, we propose MoleculeSDE. MoleculeSDE leverages group symmetric (e.g., SE(3)-equivariant and reflection-antisymmetric) stochastic differential equation models to generate the 3D geometries from 2D topologies, and vice versa, directly in the input space. It not only obtains tighter MI bound but also enables prosperous downstream tasks than the previous work. By comparing with 17 pretraining baselines, we empirically verify that MoleculeSDE can learn an expressive representation with state-of-the-art performance on 26 out of 32 downstream tasks

    Establishing computational approaches towards identifying malarial allosteric modulators: a case study of plasmodium falciparum hsp70s

    Get PDF
    Combating malaria is almost a never-ending battle, as Plasmodium parasites develop resistance to the drugs used against them, as observed recently in artemisinin-based combination therapies. The main concern now is if the resistant parasite strains spread from Southeast Asia to Africa, the continent hosting most malaria cases. To prevent catastrophic results, we need to find non-conventional approaches. Allosteric drug targeting sites and modulators might be a new hope for malarial treatments. Heat shock proteins (HSPs) are potential malarial drug targets and have complex allosteric control mechanisms. Yet, studies on designing allosteric modulators against them are limited. Here, we identified allosteric modulators (SANC190 and SANC651) against P. falciparum Hsp70-1 and Hsp70-x, affecting the conformational dynamics of the proteins, delicately balanced by the endogenous ligands. Previously, we established a pipeline to identify allosteric sites and modulators. This study also further investigated alternative approaches to speed up the process by comparing all atom molecular dynamics simulations and dynamic residue network analysis with the coarse-grained (CG) versions of the calculations. Betweenness centrality (BC) profiles for PfHsp70-1 and PfHsp70-x derived from CG simulations not only revealed similar trends but also pointed to the same functional regions and specific residues corresponding to BC profile peaks

    Predicted binding site information improves model ranking in protein docking using experimental and computer-generated target structures

    Get PDF
    © 2015 Maheshwari and Brylinski. Background: Protein-protein interactions (PPIs) mediate the vast majority of biological processes, therefore, significant efforts have been directed to investigate PPIs to fully comprehend cellular functions. Predicting complex structures is critical to reveal molecular mechanisms by which proteins operate. Despite recent advances in the development of new methods to model macromolecular assemblies, most current methodologies are designed to work with experimentally determined protein structures. However, because only computer-generated models are available for a large number of proteins in a given genome, computational tools should tolerate structural inaccuracies in order to perform the genome-wide modeling of PPIs. Results: To address this problem, we developed eRankPPI, an algorithm for the identification of near-native conformations generated by protein docking using experimental structures as well as protein models. The scoring function implemented in eRankPPI employs multiple features including interface probability estimates calculated by eFindSitePPI and a novel contact-based symmetry score. In comparative benchmarks using representative datasets of homo- and hetero-complexes, we show that eRankPPI consistently outperforms state-of-the-art algorithms improving the success rate by ∼10 %. Conclusions: eRankPPI was designed to bridge the gap between the volume of sequence data, the evidence of binary interactions, and the atomic details of pharmacologically relevant protein complexes. Tolerating structure imperfections in computer-generated models opens up a possibility to conduct the exhaustive structure-based reconstruction of PPI networks across proteomes. The methods and datasets used in this study are available at www.brylinski.org/eRankPPI

    An extensive survey on Diffusion models

    Get PDF
    Denoising Diffusion models are gaining growing popularity in the field of generative modeling for several reasons. These reasons include the straightforward and stable training, the outstanding generative quality, and the robust probabilistic foundation, picture synthesis, video production, and molecular design are all examples of what this tool can do. This thesis explores denoising diffusion models, which are statistical models that aim to remove noise from an image while preserving its important features. The study focuses on developing new techniques for improving the performance of denoising diffusion models, such as incorporating prior information about the image structure, designing more efficient numerical algorithms for solving the models, and evaluating the effectiveness of the denoising algorithms using various quality metrics. The research also investigates the application of denoising diffusion models in various image processing tasks, such as image restoration, feature extraction, and segmentation. The performance of the proposed methods is evaluated on a variety of benchmark datasets, and the results demonstrate significant improvements in denoising accuracy compared to existing state-of-the-art techniques. Overall, this thesis provides valuable insights into the development and application of denoising diffusion models, which have important applications in many fields, including medical imaging, computer vision, and remote sensing. The proposed techniques and algorithms can potentially lead to significant advances in image processing and analysis, with practical implications for improving the quality and reliability of image-based applications

    Molecular evolution of rDNA in early diverging Metazoa

    Get PDF
    Background: The cytoplasmic ribosomal small subunit (SSU, 18S) ribosomal RNA (rRNA) is the most frequently-used gene for molecular phylogenetic studies. However, information regarding its secondary structure is neglected in most phylogenetic analyses. Incorporation of this information is essential in order to apply specific rRNA evolutionary models to overcome the problem of co-evolution of paired sites, which violates the basic assumption of the independent evolution of sites made by most phylogenetic methods. Information about secondary structure also supports the process of aligning rRNA sequences across taxa. Both aspects have been shown to increase the accuracy of phylogenetic reconstructions within various taxa. Here, we explore SSU rRNA secondary structures from the three extant classes of Phylum Porifera (Grant, 1836), a pivotal, but largely unresolved taxon of early branching Metazoa. This is the first phylogenetic study of poriferan SSU rRNA data to date that includes detailed comparative secondary structure information for all three sponge classes. Results: We found base compositional and structural differences in SSU rRNA among Demospongiae, Hexactinellida (glass sponges) and Calcarea, (calcareous sponges). We showed that analyses of primary rRNA sequences, including secondary structure-specific evolutionary models, in combination with reconstruction of the evolution of unusual structural features, reveal a substantial amount of additional information. Of special note was the finding that the gene tree topologies of marine haplosclerid demosponges, which are inconsistent with the current morphology-based classification, are supported by our reconstructed evolution of secondary structure features. Therefore, these features can provide alternative support for sequencebased topologies and give insights into the evolution of the molecule itself. To encourage and acilitate the application of rRNA models in phylogenetics of early metazoans, we present 52 SSU rRNA secondary structures over the taxonomic range of Porifera in a database, along with some basic tools for relevant format-conversion. Conclusions: We demonstrated that sophisticated secondary structure analyses can increase the potential phylogenetic information of already available rDNA sequences currently accessible in databases and conclude that the importance of SSU rRNA secondary structure information for phylogenetic reconstruction is still generally underestimated, at least among certain early branching metazoans

    Predicted binding site information improves model ranking in protein docking using experimental and computer-generated target structures

    Get PDF
    BACKGROUND: Protein-protein interactions (PPIs) mediate the vast majority of biological processes, therefore, significant efforts have been directed to investigate PPIs to fully comprehend cellular functions. Predicting complex structures is critical to reveal molecular mechanisms by which proteins operate. Despite recent advances in the development of new methods to model macromolecular assemblies, most current methodologies are designed to work with experimentally determined protein structures. However, because only computer-generated models are available for a large number of proteins in a given genome, computational tools should tolerate structural inaccuracies in order to perform the genome-wide modeling of PPIs. RESULTS: To address this problem, we developed eRank(PPI), an algorithm for the identification of near-native conformations generated by protein docking using experimental structures as well as protein models. The scoring function implemented in eRank(PPI) employs multiple features including interface probability estimates calculated by eFindSite(PPI) and a novel contact-based symmetry score. In comparative benchmarks using representative datasets of homo- and hetero-complexes, we show that eRank(PPI) consistently outperforms state-of-the-art algorithms improving the success rate by ~10 %. CONCLUSIONS: eRank(PPI) was designed to bridge the gap between the volume of sequence data, the evidence of binary interactions, and the atomic details of pharmacologically relevant protein complexes. Tolerating structure imperfections in computer-generated models opens up a possibility to conduct the exhaustive structure-based reconstruction of PPI networks across proteomes. The methods and datasets used in this study are available at www.brylinski.org/erankppi

    Modern applications of machine learning in quantum sciences

    Get PDF
    In these Lecture Notes, we provide a comprehensive introduction to the most recent advances in the application of machine learning methods in quantum sciences. We cover the use of deep learning and kernel methods in supervised, unsupervised, and reinforcement learning algorithms for phase classification, representation of many-body quantum states, quantum feedback control, and quantum circuits optimization. Moreover, we introduce and discuss more specialized topics such as differentiable programming, generative models, statistical approach to machine learning, and quantum machine learning

    Hydropathic Interactions and Protein Structure: Utilizing the HINT Force Field in Structure Prediction and Protein‐Protein Docking.

    Get PDF
    Protein structure predication is a field of computational molecular modeling with an enormous potential for improvement. Side-chain geometry prediction is a critical component of this process that is crucial for computational protein structure predication as well as crystallographers in refining experimentally determined protein crystal structures. The cornerstone of side-chain geometry prediction are side-chain rotamer libraries, usually obtained through exhaustive statistical analysis of existing protein structures. Little is known, however, about the driving forces leading to the preference or suitability of one rotamer over another. Construction of 3D hydropathic interaction maps for nearly 30,000 tyrosines extracted from the PDB reveals their environments, in terms of hydrophobic and polar (collectively “hydropathic”) interactions. Using a unique 3D similarity metric, these environments were clustered with k-means. In the ϕ, ψ region (–200° \u3c ϕ \u3c –155°; –205° \u3c ψ \u3c –160°) representing 631 tyrosines, clustering reduced the set to 14 unique hydropathic environments, with most diversity arising from favorable hydrophobic interactions. Polar interactions for tyrosine include ubiquitous hydrogen bonding with the phenolic OH and a handful of unique environments surrounding the backbone. The memberships of all but one of the 14 environments are dominated by a single χ1/χ2 rotamer. Each tyrosine residue attempts to fulfill its hydropathic valence. Structural water molecules are thus used in a variety of roles throughout protein structure. A second project involves elucidating the 3D structure of CRIP1a, a cannabinoid 1 receptor (CB1R) binding protein that could provide information for designing small molecules targeting the CRIP1a-CB1R interaction. The CRIP1a protein was produced in high purity. Crystallization experiments failed, both with and without the last 9 or 12 amino acid peptide of the CB1R C-terminus. Attempts were made to use NMR for structure determination; however, the protein precipitated out during data acquisition. A model was thus built computationally to which the CB1R C-terminus peptide was docked. HINT was used in selecting optimum models and analyzing interactions involved in the CRIP1a-CB1R complex. The final model demonstrated key putative interactions between CRIP1a and CB1R while also predicting highly flexible areas of the CRIP1a possibly contributing to the difficulties faced during crystallization
    corecore