23 research outputs found
A posteriori metadata from automated provenance tracking: Integration of AiiDA and TCOD
In order to make results of computational scientific research findable,
accessible, interoperable and re-usable, it is necessary to decorate them with
standardised metadata. However, there are a number of technical and practical
challenges that make this process difficult to achieve in practice. Here the
implementation of a protocol is presented to tag crystal structures with their
computed properties, without the need of human intervention to curate the data.
This protocol leverages the capabilities of AiiDA, an open-source platform to
manage and automate scientific computational workflows, and TCOD, an
open-access database storing computed materials properties using a well-defined
and exhaustive ontology. Based on these, the complete procedure to deposit
computed data in the TCOD database is automated. All relevant metadata are
extracted from the full provenance information that AiiDA tracks and stores
automatically while managing the calculations. Such a protocol also enables
reproducibility of scientific data in the field of computational materials
science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170
theoretical structures together with their computed properties and their full
provenance graphs, consisting in over 4600 AiiDA nodes
Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database
Computer descriptions of chemical molecular connectivity are necessary for searching chemical databases and for
predicting chemical properties from molecular structure. In this article, the ongoing work to describe the chemical
connectivity of entries contained in the Crystallography Open Database (COD) in SMILES format is reported. This collection
of SMILES is publicly available for chemical (substructure) search or for any other purpose on an open-access
basis, as is the COD itself. The conventions that have been followed for the representation of compounds that do
not fit into the valence bond theory are outlined for the most frequently found cases. The procedure for getting the
SMILES out of the CIF files starts with checking whether the atoms in the asymmetric unit are a chemically acceptable
image of the compound. When they are not (molecule in a symmetry element, disorder, polymeric species,etc.),
the previously published cif_molecule program is used to get such image in many cases. The program package
Open Babel is then applied to get SMILES strings from the CIF files (either those directly taken from the COD or those
produced by cif_molecule when applicable). The results are then checked and/or fixed by a human editor, in a
computer-aided task that at present still consumes a great deal of human time. Even if the procedure still needs to be
improved to make it more automatic (and hence faster), it has already yielded more than 160,000 curated chemical
structures and the purpose of this article is to announce the existence of this work to the chemical community as well
as to spread the use of its results.The authors are grateful to the Junta de Andalucía (Research Group FQM-195)
for financial support of the publication costs of this article
Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database
Computer descriptions of chemical molecular connectivity are necessary for searching chemical databases and for
predicting chemical properties from molecular structure. In this article, the ongoing work to describe the chemical
connectivity of entries contained in the Crystallography Open Database (COD) in SMILES format is reported. This collection
of SMILES is publicly available for chemical (substructure) search or for any other purpose on an open-access
basis, as is the COD itself. The conventions that have been followed for the representation of compounds that do
not fit into the valence bond theory are outlined for the most frequently found cases. The procedure for getting the
SMILES out of the CIF files starts with checking whether the atoms in the asymmetric unit are a chemically acceptable
image of the compound. When they are not (molecule in a symmetry element, disorder, polymeric species,etc.),
the previously published cif_molecule program is used to get such image in many cases. The program package
Open Babel is then applied to get SMILES strings from the CIF files (either those directly taken from the COD or those
produced by cif_molecule when applicable). The results are then checked and/or fixed by a human editor, in a
computer-aided task that at present still consumes a great deal of human time. Even if the procedure still needs to be
improved to make it more automatic (and hence faster), it has already yielded more than 160,000 curated chemical
structures and the purpose of this article is to announce the existence of this work to the chemical community as well
as to spread the use of its results.The authors are grateful to the Junta de Andalucía (Research Group FQM-195)
for financial support of the publication costs of this article
Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration
Using an open-access distribution model, the Crystallography Open Database (COD, http://www.crystallography.net) collects all known ‘small molecule / small to medium sized unit cell’ crystal structures and makes them available freely on the Internet. As of today, the COD has aggregated ∼150 000 structures, offering basic search capabilities and the possibility to download the whole database, or parts thereof using a variety of standard open communication protocols. A newly developed website provides capabilities for all registered users to deposit published and so far unpublished structures as personal communications or pre-publication depositions. Such a setup enables extension of the COD database by many users simultaneously. This increases the possibilities for growth of the COD database, and is the first step towards establishing a world wide Internet-based collaborative platform dedicated to the collection and curation of structural knowledge
OPTIMADE, an API for exchanging materials data
The Open Databases Integration for Materials Design (OPTIMADE) consortium has designed a universal application programming interface (API) to make materials databases accessible and interoperable. We outline the first stable release of the specification, v1.0, which is already supported by many leading databases and several software packages. We illustrate the advantages of the OPTIMADE API through worked examples on each of the public materials databases that support the full API specification
OPTIMADE, an API for exchanging materials data
: The Open Databases Integration for Materials Design (OPTIMADE) consortium has designed a universal application programming interface (API) to make materials databases accessible and interoperable. We outline the first stable release of the specification, v1.0, which is already supported by many leading databases and several software packages. We illustrate the advantages of the OPTIMADE API through worked examples on each of the public materials databases that support the full API specification
OPTIMADE, an API for exchanging materials data.
The Open Databases Integration for Materials Design (OPTIMADE) consortium has designed a universal application programming interface (API) to make materials databases accessible and interoperable. We outline the first stable release of the specification, v1.0, which is already supported by many leading databases and several software packages. We illustrate the advantages of the OPTIMADE API through worked examples on each of the public materials databases that support the full API specification
Kristalografinės informacijos išgavimas bei panaudojimas molekulių modelių tikslinimui ir tikrinimui
This dissertation describes fully automated means to extract geometric information – interatomic bond lengths, bond and dihedral angles – from small-molecule crystal structures, and to use this information for the validation of novel crystal structures. Crystallography Open Database (COD), regularly updated open-access resource of small-molecule crystal structures, has been chosen as the source of input data. Software has been developed to prefilter the records from the COD, transform them to a form appropriate for geometric analysis, extract and organise the geometric parameters. Statistical models chosen to describe the groups of chemically similar observations can be used for Bayesian method-based outlier detection: previously unseen, or seen relatively rarely, geometric observations in molecules in consideration are spotted and marked for further analysis. Software implementing this principle has been developed and a Web based user interface has been presented. The method for structure validation has been tested with novel, retracted and deliberately deformed small-molecule crystal structures. The main conclusions of this dissertation are that the COD is a proper resource for small-molecule geometric information, developed methods and software tools are sufficient to organise the data from the source database into a library of molecular geometry, which is in turn capable to spot unusual geometric features in small-molecule crystal structures
Extraction and usage of crystallographic knowledge for refinement and validation of molecular models
This dissertation describes fully automated means to extract geometric information – interatomic bond lengths, bond and dihedral angles – from small-molecule crystal structures, and to use this information for the validation of novel crystal structures. Crystallography Open Database (COD), regularly updated open-access resource of small-molecule crystal structures, has been chosen as the source of input data. Software has been developed to prefilter the records from the COD, transform them to a form appropriate for geometric analysis, extract and organise the geometric parameters. Statistical models chosen to describe the groups of chemically similar observations can be used for Bayesian method-based outlier detection: previously unseen, or seen relatively rarely, geometric observations in molecules in consideration are spotted and marked for further analysis. Software implementing this principle has been developed and a Web based user interface has been presented. The method for structure validation has been tested with novel, retracted and deliberately deformed small-molecule crystal structures. The main conclusions of this dissertation are that the COD is a proper resource for small-molecule geometric information, developed methods and software tools are sufficient to organise the data from the source database into a library of molecular geometry, which is in turn capable to spot unusual geometric features in small-molecule crystal structures
Graph isomorphism‑based algorithm for cross‑checking chemical and crystallographic descriptions
Published reports of chemical compounds often contain multiple machine-readable descriptions which may supplement
each other in order to yield coherent and complete chemical representations. This publication presents a
method to cross-check such descriptions using a canonical representation and isomorphism of molecular graphs.
If immediate agreement between compound descriptions is not found, the algorithm derives the minimal set of
simplifications required for both descriptions to arrive to a matching form (if any). The proposed algorithm is used to
cross-check chemical descriptions from the Crystallography Open Database to identify coherently described entries
as well as those requiring further curation.Research Council of Lithuania
under Grant agreement No. MIP-20-2