6,308 research outputs found
Domain-Agnostic Molecular Generation with Self-feedback
The generation of molecules with desired properties has gained tremendous
popularity, revolutionizing the way scientists design molecular structures and
providing valuable support for chemical and drug design. However, despite the
potential of language models in molecule generation, they face numerous
challenges such as the generation of syntactically or chemically flawed
molecules, narrow domain focus, and limitations in creating diverse and
directionally feasible molecules due to a dearth of annotated data or external
molecular databases. To this end, we introduce MolGen, a pre-trained molecular
language model tailored specifically for molecule generation. MolGen acquires
intrinsic structural and grammatical insights by reconstructing over 100
million molecular SELFIES, while facilitating knowledge transfer between
different domains through domain-agnostic molecular prefix tuning. Moreover, we
present a self-feedback paradigm that inspires the pre-trained model to align
with the ultimate goal of producing molecules with desirable properties.
Extensive experiments on well-known benchmarks confirm MolGen's optimization
capabilities, encompassing penalized logP, QED, and molecular docking
properties. Further analysis shows that MolGen can accurately capture molecule
distributions, implicitly learn their structural characteristics, and
efficiently explore chemical space. The pre-trained model, codes, and datasets
are publicly available for future research at https://github.com/zjunlp/MolGen.Comment: Work in progress. Add results of binding affinit
`The frozen accident' as an evolutionary adaptation: A rate distortion theory perspective on the dynamics and symmetries of genetic coding mechanisms
We survey some interpretations and related issues concerning the frozen hypothesis due to F. Crick and how it can be explained in terms of several natural mechanisms involving error correction codes, spin glasses, symmetry breaking and the characteristic robustness of genetic networks. The approach to most of these questions involves using elements of Shannon's rate distortion theory incorporating a semantic system which is meaningful for the relevant alphabets and vocabulary implemented in transmission of the genetic code. We apply the fundamental homology between information source uncertainty with the free energy density of a thermodynamical system with respect to transcriptional regulators and the communication channels of sequence/structure in proteins. This leads to the suggestion that the frozen accident may have been a type of evolutionary adaptation
Structure based inhibitor design targeting glycogen phosphorylase b. Virtual screening, synthesis, biochemical and biological assessment of novel N-acyl-Ī²-d-glucopyranosylamines
Glycogen phosphorylase (GP) is a validated target for the development of new type 2 diabetes treatments. Exploiting the Zinc docking database, we report the in silico screening of 1888 Ī²- D-glucopyranose-NH-CO-R putative GP inhibitors differing only in their R groups. CombiGlide and GOLD docking programs with different scoring functions were employed with the best performing methods combined in a āconsensus scoringā approach to ranking of ligand binding affinities for the active site. Six selected candidates from the screening were then synthesized and their inhibitory potency was assessed both in vitro and ex vivo. Their inhibition constantsā values, in vitro, ranged from 5 to 377 ĀµM while two of them were effective at causing inactivation of GP in rat hepatocytes at low ĀµM concentrations. The crystal structures of GP in complex with the inhibitors were defined and provided the structural basis for their inhibitory potency and data for further structure based design of more potent inhibitors
Frustration in Biomolecules
Biomolecules are the prime information processing elements of living matter.
Most of these inanimate systems are polymers that compute their structures and
dynamics using as input seemingly random character strings of their sequence,
following which they coalesce and perform integrated cellular functions. In
large computational systems with a finite interaction-codes, the appearance of
conflicting goals is inevitable. Simple conflicting forces can lead to quite
complex structures and behaviors, leading to the concept of "frustration" in
condensed matter. We present here some basic ideas about frustration in
biomolecules and how the frustration concept leads to a better appreciation of
many aspects of the architecture of biomolecules, and how structure connects to
function. These ideas are simultaneously both seductively simple and perilously
subtle to grasp completely. The energy landscape theory of protein folding
provides a framework for quantifying frustration in large systems and has been
implemented at many levels of description. We first review the notion of
frustration from the areas of abstract logic and its uses in simple condensed
matter systems. We discuss then how the frustration concept applies
specifically to heteropolymers, testing folding landscape theory in computer
simulations of protein models and in experimentally accessible systems.
Studying the aspects of frustration averaged over many proteins provides ways
to infer energy functions useful for reliable structure prediction. We discuss
how frustration affects folding, how a large part of the biological functions
of proteins are related to subtle local frustration effects and how frustration
influences the appearance of metastable states, the nature of binding
processes, catalysis and allosteric transitions. We hope to illustrate how
Frustration is a fundamental concept in relating function to structural
biology.Comment: 97 pages, 30 figure
A nonuniform popularity-similarity optimization (nPSO) model to efficiently generate realistic complex networks with communities
The hidden metric space behind complex network topologies is a fervid topic
in current network science and the hyperbolic space is one of the most studied,
because it seems associated to the structural organization of many real complex
systems. The Popularity-Similarity-Optimization (PSO) model simulates how
random geometric graphs grow in the hyperbolic space, reproducing strong
clustering and scale-free degree distribution, however it misses to reproduce
an important feature of real complex networks, which is the community
organization. The Geometrical-Preferential-Attachment (GPA) model was recently
developed to confer to the PSO also a community structure, which is obtained by
forcing different angular regions of the hyperbolic disk to have variable level
of attractiveness. However, the number and size of the communities cannot be
explicitly controlled in the GPA, which is a clear limitation for real
applications. Here, we introduce the nonuniform PSO (nPSO) model that,
differently from GPA, forces heterogeneous angular node attractiveness by
sampling the angular coordinates from a tailored nonuniform probability
distribution, for instance a mixture of Gaussians. The nPSO differs from GPA in
other three aspects: it allows to explicitly fix the number and size of
communities; it allows to tune their mixing property through the network
temperature; it is efficient to generate networks with high clustering. After
several tests we propose the nPSO as a valid and efficient model to generate
networks with communities in the hyperbolic space, which can be adopted as a
realistic benchmark for different tasks such as community detection and link
prediction
Arg343 in Human Surfactant Protein D Governs Discrimination between Glucose and N-Acetylglucosamine Ligands
Surfactant protein D (SP-D), one of the members of the collectin family of C-type lectins, is an important component of pulmonary innate immunity. SP-D binds carbohydrates in a calcium-dependent manner, but the mechanisms governing its ligand recognition specificity are not well understood. SP-D binds glucose (Glc) stronger than N-acetylglucosamine (GlcNAc). Structural superimposition of hSP-D with mannose- binding protein C (MBP-C) complexed with GlcNAc reveals steric clashes between the ligand and the side chain of Arg343 in hSP-D. To test whether Arg343contributes to Glc \u3e GlcNAc recognition specificity, we constructed a computational model of Arg343āVal (R343V) mutant hSP-D based on homology with MBP-C. Automated docking of Ī±-Me-Glc and Ī±-Me-GlcNAc into wild-type hSP-D and the R343V mutant of hSP-D suggests that Arg343 is critical in determining ligand-binding specificity by sterically prohibiting one binding orientation. To empirically test the docking predictions, an R343V mutant recombinant hSP-D was constructed. Inhibition analysis shows that the R343V mutant binds both Glc and GlcNAc with higher affinity than the wild-type protein and that the R343V mutant binds Glc and GlcNAc equally well. These data demonstrate that Arg343 is critical for hSP-D recognition specificity and plays a key role in defining ligand specificity differences between MBP and SP-D. Additionally, our results suggest that the number of binding orientations contributes to monosaccharide binding affinity
A Similarity Based Approach for Chemical Category Classification
This report aims to describe the main outcomes of an IHCP Exploratory Research Project carried out during 2005 by the European Chemicals Bureau (Computational Toxicology Action). The original aim of this project was to develop a computational method to facilitate the classification of chemicals into similarity-based chemical categories, which would be both useful for building (Q)SAR models (research application) and for defining chemical category proposals (regulatory application).JRC.I-Institute for Health and Consumer Protection (Ispra
- ā¦