1,254 research outputs found
The octet rule in chemical space: Generating virtual molecules
We present a generator of virtual molecules that selects valid chemistry on
the basis of the octet rule. Also, we introduce a mesomer group key that allows
a fast detection of duplicates in the generated structures.
Compared to existing approaches, our model is simpler and faster, generates
new chemistry and avoids invalid chemistry. Its versatility is illustrated by
the correct generation of molecules containing third-row elements and a
surprisingly adept handling of complex boron chemistry.
Without any empirical parameters, our model is designed to be valid also in
unexplored regions of chemical space. One first unexpected finding is the high
prevalence of dipolar structures among generated molecules.Comment: 24 pages, 10 figure
Recommended from our members
Robocrystallographer: Automated crystal structure text descriptions and analysis
Our ability to describe crystal structure features is of crucial importance when attempting to understand structure-property relationships in the solid state. In this paper, the authors introduce robocrystallographer, an open-source toolkit for analyzing crystal structures. This package combines new and existing open-source analysis tools to provide structural information, including the local coordination and polyhedral type, polyhedral connectivity, octahedral tilt angles, component-dimensionality, and molecule-within-crystal and fuzzy prototype identification. Using this information, robocrystallographer can generate text-based descriptions of crystal structures that resemble descriptions written by human crystallographers. The authors use robocrystallographer to investigate the dimensionalities of all compounds in the Materials Project database and highlight its potential in machine learning studies
Communication and re-use of chemical information in bioscience.
The current methods of publishing chemical information in bioscience articles are analysed. Using 3 papers as use-cases, it is shown that conventional methods using human procedures, including cut-and-paste are time-consuming and introduce errors. The meaning of chemical terms and the identity of compounds is often ambiguous. valuable experimental data such as spectra and computational results are almost always omitted. We describe an Open XML architecture at proof-of-concept which addresses these concerns. Compounds are identified through explicit connection tables or links to persistent Open resources such as PubChem. It is argued that if publishers adopt these tools and protocols, then the quality and quantity of chemical information available to bioscientists will increase and the authors, publishers and readers will find the process cost-effective.An article submitted to BiomedCentral Bioinformatics, created on request with their Publicon system. The transformed manuscript is archived as PDF. Although it has been through the publishers system this is purely automatic and the contents are those of a pre-refereed preprint. The formatting is provided by the system and tables and figures appear at the end. An accommpanying submission, http://www.dspace.cam.ac.uk/handle/1810/34580, describes the rationale and cultural aspects of publishing , abstracting and aggregating chemical information. BMC is an Open Access publisher and we emphasize that all content is re-usable under Creative Commons Licens
Stereo-Aware Extension of HOSE Codes
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Descriptions of molecular environments have many applications in chemoinformatics, including chemical shift prediction. Hierarchically ordered spherical environment (HOSE) codes are the most popular such descriptions. We developed a method to extend these with stereochemistry information. It enables distinguishing atoms which would be considered identical in traditional HOSE codes. The use of our method is demonstrated by chemical shift predictions for molecules in the nmrshiftdb2 database. We give a full specification and an implementation
Espaloma-0.3.0: Machine-learned molecular mechanics force field for the simulation of protein-ligand systems and beyond
Molecular mechanics (MM) force fields -- the models that characterize the
energy landscape of molecular systems via simple pairwise and polynomial terms
-- have traditionally relied on human expert-curated, inflexible, and poorly
extensible discrete chemical parameter assignment rules, namely atom or valence
types. Recently, there has been significant interest in using graph neural
networks to replace this process, while enabling the parametrization scheme to
be learned in an end-to-end differentiable manner directly from quantum
chemical calculations or condensed-phase data. In this paper, we extend the
Espaloma end-to-end differentiable force field construction approach by
incorporating both energy and force fitting directly to quantum chemical data
into the training process. Building on the OpenMM SPICE dataset, we curate a
dataset containing chemical spaces highly relevant to the broad interest of
biomolecular modeling, covering small molecules, proteins, and RNA. The
resulting force field, espaloma 0.3.0, self-consistently parametrizes these
diverse biomolecular species, accurately predicts quantum chemical energies and
forces, and maintains stable quantum chemical energy-minimized geometries.
Surprisingly, this simple approach produces highly accurate protein-ligand
binding free energies when self-consistently parametrizing protein and ligand.
This approach -- capable of fitting new force fields to large quantum chemical
datasets in one GPU-day -- shows significant promise as a path forward for
building systematically more accurate force fields that can be easily extended
to new chemical domains of interest
PubChem3D: a new resource for scientists
<p>Abstract</p> <p>Background</p> <p>PubChem is an open repository for small molecules and their experimental biological activity. PubChem integrates and provides search, retrieval, visualization, analysis, and programmatic access tools in an effort to maximize the utility of contributed information. There are many diverse chemical structures with similar biological efficacies against targets available in PubChem that are difficult to interrelate using traditional 2-D similarity methods. A new layer called PubChem3D is added to PubChem to assist in this analysis.</p> <p>Description</p> <p>PubChem generates a 3-D conformer model description for 92.3% of all records in the PubChem Compound database (when considering the parent compound of salts). Each of these conformer models is sampled to remove redundancy, guaranteeing a minimum (non-hydrogen atom pair-wise) RMSD between conformers. A diverse conformer ordering gives a maximal description of the conformational diversity of a molecule when only a subset of available conformers is used. A pre-computed search per compound record gives immediate access to a set of 3-D similar compounds (called "Similar Conformers") in PubChem and their respective superpositions. Systematic augmentation of PubChem resources to include a 3-D layer provides users with new capabilities to search, subset, visualize, analyze, and download data.</p> <p>A series of retrospective studies help to demonstrate important connections between chemical structures and their biological function that are not obvious using 2-D similarity but are readily apparent by 3-D similarity.</p> <p>Conclusions</p> <p>The addition of PubChem3D to the existing contents of PubChem is a considerable achievement, given the scope, scale, and the fact that the resource is publicly accessible and free. With the ability to uncover latent structure-activity relationships of chemical structures, while complementing 2-D similarity analysis approaches, PubChem3D represents a new resource for scientists to exploit when exploring the biological annotations in PubChem.</p
MATCH: An atom‐typing toolset for molecular mechanics force fields
We introduce a toolset of program libraries collectively titled multipurpose atom‐typer for CHARMM (MATCH) for the automated assignment of atom types and force field parameters for molecular mechanics simulation of organic molecules. The toolset includes utilities for the conversion of multiple chemical structure file formats into a molecular graph. A general chemical pattern‐matching engine using this graph has been implemented whereby assignment of molecular mechanics atom types, charges, and force field parameters are achieved by comparison against a customizable list of chemical fragments. While initially designed to complement the CHARMM simulation package and force fields by generating the necessary input topology and atom‐type data files, MATCH can be expanded to any force field and program, and has core functionality that makes it extendable to other applications such as fragment‐based property prediction. In this work, we demonstrate the accurate construction of atomic parameters of molecules within each force field included in CHARMM36 through exhaustive cross validation studies illustrating that bond charge increment rules derived from one force field can be transferred to another. In addition, using leave‐one‐out substitution it is shown that it is also possible to substitute missing intra and intermolecular parameters with ones included in a force field to complete the parameterization of novel molecules. Finally, to demonstrate the robustness of MATCH and the coverage of chemical space offered by the recent CHARMM general force field (Vanommeslaeghe, et al., J Comput Chem 2010, 31, 671), one million molecules from the PubChem database of small molecules are typed, parameterized, and minimized. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/88100/1/JCC_21963_sm_SuppInfo.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/88100/2/21963_ftp.pd
Application and Development of Computational Methods for Ligand-Based Virtual Screening
The detection of novel active compounds that are able to modulate the biological function of a target is the primary goal of drug discovery. Different screening methods are available to identify hit compounds having the desired bioactivity in a large collection of molecules. As a computational method, virtual screening (VS) is used to search compound libraries in silico and identify those compounds that are likely to exhibit a specific activity. Ligand-based virtual screening (LBVS) is a subdiscipline that uses the information of one or more known active compounds in order to identify new hit compounds. Different LBVS methods exist, e.g. similarity searching and support vector machines (SVMs). In order to enable the application of these computational approaches, compounds have to be described numerically. Fingerprints derived from the two-dimensional compound structure, called 2D fingerprints, are among the most popular molecular descriptors available. This thesis covers the usage of 2D fingerprints in the context of LBVS. The first part focuses on a detailed analysis of 2D fingerprints. Their performance range against a wide range of pharmaceutical targets is globally estimated through fingerprint-based similarity searching. Additionally, mechanisms by which fingerprints are capable of detecting structurally diverse active compounds are identified. For this purpose, two different feature selection methods are applied to find those fingerprint features that are most relevant for the active compounds and distinguish them from other compounds. Then, 2D fingerprints are used in SVM calculations. The SVM methodology provides several opportunities to include additional information about the compounds in order to direct LBVS search calculations. In a first step, a variant of the SVM approach is applied to the multi-class prediction problem involving compounds that are active against several related targets. SVM linear combination is used to recover compounds with desired activity profiles and deprioritize compounds with other activities. Then, the SVM methodology is adopted for potency-directed VS. Compound potency is incorporated into the SVM approach through potencyoriented SVM linear combination and kernel function design to direct search calculations to the preferential detection of potent hit compounds. Next, SVM calculations are applied to address an intrinsic limitation of similarity-based methods, i.e., the presence of similar compounds having large differences in their potency. An especially designed SVM approach is introduced to predict compound pairs forming such activity cliffs. Finally, the impact of different training sets on the recall performance of SVM-based VS is analyzed and caveats are identified
- …