6,308 research outputs found

    Domain-Agnostic Molecular Generation with Self-feedback

    Full text link
    The generation of molecules with desired properties has gained tremendous popularity, revolutionizing the way scientists design molecular structures and providing valuable support for chemical and drug design. However, despite the potential of language models in molecule generation, they face numerous challenges such as the generation of syntactically or chemically flawed molecules, narrow domain focus, and limitations in creating diverse and directionally feasible molecules due to a dearth of annotated data or external molecular databases. To this end, we introduce MolGen, a pre-trained molecular language model tailored specifically for molecule generation. MolGen acquires intrinsic structural and grammatical insights by reconstructing over 100 million molecular SELFIES, while facilitating knowledge transfer between different domains through domain-agnostic molecular prefix tuning. Moreover, we present a self-feedback paradigm that inspires the pre-trained model to align with the ultimate goal of producing molecules with desirable properties. Extensive experiments on well-known benchmarks confirm MolGen's optimization capabilities, encompassing penalized logP, QED, and molecular docking properties. Further analysis shows that MolGen can accurately capture molecule distributions, implicitly learn their structural characteristics, and efficiently explore chemical space. The pre-trained model, codes, and datasets are publicly available for future research at https://github.com/zjunlp/MolGen.Comment: Work in progress. Add results of binding affinit

    `The frozen accident' as an evolutionary adaptation: A rate distortion theory perspective on the dynamics and symmetries of genetic coding mechanisms

    Get PDF
    We survey some interpretations and related issues concerning the frozen hypothesis due to F. Crick and how it can be explained in terms of several natural mechanisms involving error correction codes, spin glasses, symmetry breaking and the characteristic robustness of genetic networks. The approach to most of these questions involves using elements of Shannon's rate distortion theory incorporating a semantic system which is meaningful for the relevant alphabets and vocabulary implemented in transmission of the genetic code. We apply the fundamental homology between information source uncertainty with the free energy density of a thermodynamical system with respect to transcriptional regulators and the communication channels of sequence/structure in proteins. This leads to the suggestion that the frozen accident may have been a type of evolutionary adaptation

    Structure based inhibitor design targeting glycogen phosphorylase b. Virtual screening, synthesis, biochemical and biological assessment of novel N-acyl-Ī²-d-glucopyranosylamines

    Get PDF
    Glycogen phosphorylase (GP) is a validated target for the development of new type 2 diabetes treatments. Exploiting the Zinc docking database, we report the in silico screening of 1888 Ī²- D-glucopyranose-NH-CO-R putative GP inhibitors differing only in their R groups. CombiGlide and GOLD docking programs with different scoring functions were employed with the best performing methods combined in a ā€œconsensus scoringā€ approach to ranking of ligand binding affinities for the active site. Six selected candidates from the screening were then synthesized and their inhibitory potency was assessed both in vitro and ex vivo. Their inhibition constantsā€™ values, in vitro, ranged from 5 to 377 ĀµM while two of them were effective at causing inactivation of GP in rat hepatocytes at low ĀµM concentrations. The crystal structures of GP in complex with the inhibitors were defined and provided the structural basis for their inhibitory potency and data for further structure based design of more potent inhibitors

    Frustration in Biomolecules

    Get PDF
    Biomolecules are the prime information processing elements of living matter. Most of these inanimate systems are polymers that compute their structures and dynamics using as input seemingly random character strings of their sequence, following which they coalesce and perform integrated cellular functions. In large computational systems with a finite interaction-codes, the appearance of conflicting goals is inevitable. Simple conflicting forces can lead to quite complex structures and behaviors, leading to the concept of "frustration" in condensed matter. We present here some basic ideas about frustration in biomolecules and how the frustration concept leads to a better appreciation of many aspects of the architecture of biomolecules, and how structure connects to function. These ideas are simultaneously both seductively simple and perilously subtle to grasp completely. The energy landscape theory of protein folding provides a framework for quantifying frustration in large systems and has been implemented at many levels of description. We first review the notion of frustration from the areas of abstract logic and its uses in simple condensed matter systems. We discuss then how the frustration concept applies specifically to heteropolymers, testing folding landscape theory in computer simulations of protein models and in experimentally accessible systems. Studying the aspects of frustration averaged over many proteins provides ways to infer energy functions useful for reliable structure prediction. We discuss how frustration affects folding, how a large part of the biological functions of proteins are related to subtle local frustration effects and how frustration influences the appearance of metastable states, the nature of binding processes, catalysis and allosteric transitions. We hope to illustrate how Frustration is a fundamental concept in relating function to structural biology.Comment: 97 pages, 30 figure

    A nonuniform popularity-similarity optimization (nPSO) model to efficiently generate realistic complex networks with communities

    Get PDF
    The hidden metric space behind complex network topologies is a fervid topic in current network science and the hyperbolic space is one of the most studied, because it seems associated to the structural organization of many real complex systems. The Popularity-Similarity-Optimization (PSO) model simulates how random geometric graphs grow in the hyperbolic space, reproducing strong clustering and scale-free degree distribution, however it misses to reproduce an important feature of real complex networks, which is the community organization. The Geometrical-Preferential-Attachment (GPA) model was recently developed to confer to the PSO also a community structure, which is obtained by forcing different angular regions of the hyperbolic disk to have variable level of attractiveness. However, the number and size of the communities cannot be explicitly controlled in the GPA, which is a clear limitation for real applications. Here, we introduce the nonuniform PSO (nPSO) model that, differently from GPA, forces heterogeneous angular node attractiveness by sampling the angular coordinates from a tailored nonuniform probability distribution, for instance a mixture of Gaussians. The nPSO differs from GPA in other three aspects: it allows to explicitly fix the number and size of communities; it allows to tune their mixing property through the network temperature; it is efficient to generate networks with high clustering. After several tests we propose the nPSO as a valid and efficient model to generate networks with communities in the hyperbolic space, which can be adopted as a realistic benchmark for different tasks such as community detection and link prediction

    Arg343 in Human Surfactant Protein D Governs Discrimination between Glucose and N-Acetylglucosamine Ligands

    Get PDF
    Surfactant protein D (SP-D), one of the members of the collectin family of C-type lectins, is an important component of pulmonary innate immunity. SP-D binds carbohydrates in a calcium-dependent manner, but the mechanisms governing its ligand recognition specificity are not well understood. SP-D binds glucose (Glc) stronger than N-acetylglucosamine (GlcNAc). Structural superimposition of hSP-D with mannose- binding protein C (MBP-C) complexed with GlcNAc reveals steric clashes between the ligand and the side chain of Arg343 in hSP-D. To test whether Arg343contributes to Glc \u3e GlcNAc recognition specificity, we constructed a computational model of Arg343ā†’Val (R343V) mutant hSP-D based on homology with MBP-C. Automated docking of Ī±-Me-Glc and Ī±-Me-GlcNAc into wild-type hSP-D and the R343V mutant of hSP-D suggests that Arg343 is critical in determining ligand-binding specificity by sterically prohibiting one binding orientation. To empirically test the docking predictions, an R343V mutant recombinant hSP-D was constructed. Inhibition analysis shows that the R343V mutant binds both Glc and GlcNAc with higher affinity than the wild-type protein and that the R343V mutant binds Glc and GlcNAc equally well. These data demonstrate that Arg343 is critical for hSP-D recognition specificity and plays a key role in defining ligand specificity differences between MBP and SP-D. Additionally, our results suggest that the number of binding orientations contributes to monosaccharide binding affinity

    A Similarity Based Approach for Chemical Category Classification

    Get PDF
    This report aims to describe the main outcomes of an IHCP Exploratory Research Project carried out during 2005 by the European Chemicals Bureau (Computational Toxicology Action). The original aim of this project was to develop a computational method to facilitate the classification of chemicals into similarity-based chemical categories, which would be both useful for building (Q)SAR models (research application) and for defining chemical category proposals (regulatory application).JRC.I-Institute for Health and Consumer Protection (Ispra

    Modeling and Simulation of Heat Transfer Phenomena

    Get PDF
    • ā€¦
    corecore