Search CORE

143 research outputs found

Specification of an extensible and portable file format for electronic structure and crystallographic data

Author: A. Cucca
Aulbur
C. Freysoldt
C.-O. Almbladh
D. Caliste
Gonze
Gonze
M.A.L. Marques
M.J. Verstraete
Martin
Rew
V. Olevano
X. Gonze
Y. Pouillon
Publication venue
Publication date: 01/01/2008
Field of study

In order to allow different software applications, in constant evolution, to interact and exchange data, flexible file formats are needed. A file format specification for different types of content has been elaborated to allow communication of data for the software developed within the European Network of Excellence "NANOQUANTA", focusing on first-principles calculations of materials and nanosystems. It might be used by other software as well, and is described here in detail. The format relies on the NetCDF binary input/output library, already used in many different scientific communities, that provides flexibility as well as portability accross languages and platforms. Thanks to NetCDF, the content can be accessed by keywords, ensuring the file format is extensible and backward compatible

arXiv.org e-Print Archive

Lund University Publications

Crossref

Open Repository and Bibliography - Liège

DIAL UCLouvain

MPG.PuRe

Data documentation & metadata

Author: Deng Sai
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 19/11/2014
Field of study

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

DRIVER Technology Watch Report

Author: Hochstenbach Patrick
Karstens Elbaek Mikael
Russell Rosemary
Schmelz Pedersen Gerd
Van Godtsenhoven Karen
Vanderfeesten Maurice
Publication venue: DRIVER project
Publication date: 01/01/2008
Field of study

This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field

Ghent University Academic Bibliography

Developments and applications of the OPTIMADE API for materials discovery, design, and data exchange

Author: Andersen Casper
Andersson Oskar B.
Armiento Rickard
Balderas Rubén Castañeda
Beltrán Daniel
Bergsma Johan
Blokhin Evgeny
Boland Tara M.
Bonella Sara
Botti Silvana
Carlsson Johan Martin
Carrillo Felipe de Jesús Trejo
Castro Pedro Baptista de
Cerqueira Tiago F. T.
Choudhary Kamal
Conduit Gareth
Curtarolo Stefano
Divilov Simon
Draxl Claudia
Duarte José Manuel Nápoles
Díaz Alberto Díaz
Eckert Hagen
Eimre Kristjan
Evans Matthew
Fuentes-Cobas Luis E. E
García Rodrigo Domínguez
Gražulis Saulius
Hajiyani Hamidreza
Hanke Felix
Hospital Adam
Jose Kevin
Krajewski Adam M.
Liu Zi-Kui
Marques Miguel A. L.
Marzari Nicola
Merkys Andrius
Montero María Elena Fuentes
Morris Andrew James
Mortensen Jens Jørgen
Ong Shyue Ping
Orozco Modesto
Oses Corey
Persson Kristin A.
Pietryga Jacob
Pizzi Giovanni
Qi Ji
Riebesell Janosh
Rignanese Gian-Marco
Scheidgen Markus
Schmidt Jonathan
Thygesen Kristian Sommer
Toher Cormac
Vaitkus Antanas
Winston Donald
Wolverton Chris M
Xie Christen
Yang Xiaoyu
Yu Jusong
Zettel Adam
Publication venue
Publication date: 18/04/2024
Field of study

The Open Databases Integration for Materials Design (OPTIMADE) application programming interface (API) empowers users with holistic access to a growing federation of databases, enhancing the accessibility and discoverability of materials and chemical data. Since the first release of the OPTIMADE specification (v1.0), the API has undergone significant development, leading to the upcoming v1.2 release, and has underpinned multiple scientific studies. In this work, we highlight the latest features of the API format, accompanying software tools, and provide an update on the implementation of OPTIMADE in contributing materials databases. We end by providing several use cases that demonstrate the utility of the OPTIMADE API in materials research that continue to drive its ongoing development

University of Birmingham Research Portal

Engineering polymer informatics: Towards the computer-aided design of polymers

Author: Adams
Adams
Adams
Adams
Ai
Ai
Berners-Lee
Bicerano
Blower
Brooksbank
Carrell
Chowdhury
Chowdhury
Corbett
Cuchelkar
Davies
Degtyarenko
Elias
Feldman
Fleischmann
Frenkel
Frey
Gkoutos
Gordon
Gordon
Gordon
Hamoudeh
Herz
Holliday
Hoogenboom
Hoogenboom
Jenkins
Kanehisa
Kang
Kataoka
Keener
Lamport
Ma
Malmsten
Meier
Metanomski
Murray-Rust
Murray-Rust
Murray-Rust
Murray-Rust
Murray-Rust
Putnam
Rieder
Sankar
Schmaljohann
Service
Studer
Taylor
van der Vet
van Krevelen
Wagner
Weininger
Wiesbrock
Wilks
Wilks
Wilks
Wilks
Wu
Zamora
Zamora
Zhang
Publication venue: MACROMOL RAPID COMM
Publication date: 27/03/2008
Field of study

The computer-aided design of polymers is one of the holy grails of modern chemical informatics and of significant interest for a number of communities in polymer science. The paper outlines a vision for the in silico design of polymers and presents an information model for polymers based on modern semantic web technologies, thus laying the foundations for achieving the vision

Crossref

Apollo (Cambridge)

Recommended from our members

Automatic Analysis and Validation of the Chemical Literature

Author: Townsend JA
Publication venue: University of Cambridge
Publication date: 01/02/2008
Field of study

ThesisMethods to automatically extract and validate data from the chemical literature in legacy formats to machine-understandable forms are examined. The work focuses of three types of data: analytical data reported in articles, computational chemistry output files and crystallographic information files (CIFs). It is shown that machines are capable of reading and extracting analytical data from the current legacy formats with high recall and precision. Regular expressions cannot identify chemical names with high precision or recall but non-deterministic methods perform significantly better. The lack of machine-understandable connection tables in the literature has been identified as the major issue preventing molecule-based data-driven science being performed in the area. The extraction of data from computational chemistry output files using parser-like approaches is shown to be not generally possible although such methods work well for input files. A hierarchical regular expression based approach can parse > 99:9% of the output files correctly although significant human input is required to prepare the templates. CIFs may be parsed with extremely high recall and precision, contain connection tables and the data is of high quality. The comparison of bond lengths calculated by two computational chemistry programs show good agreement in general but structures containing specific moieties cause discrepancies. An initial protocol for the high-throughput geometry optimisation of molecules extracted from the CIFs is presented and the refinement of this protocol is discussed. Differences in bond length between calculated and experimentally determined values from the CIFs of less than 0.03 Angstrom are shown to be expected by random error. The final protocol is used to find high-quality structures from crystallography which can be reused for further science.Unilever Centre for Molecular Science Informatic

Apollo (Cambridge)

Encoding, Storing and Searching of Analytical Properties and Assigned Metabolite Structures

Author: Helmus Tobias
Publication venue
Publication date: 01/01/2007
Field of study

Informationen über Metabolite und andere kleine organische Moleküle sind von entscheidender Bedeutung in vielen verschiedenen Bereichen der Naturwissenschaften. Sie spielen z.B. eine entscheidende Rolle in metabolischen Netzwerken und das Wissen über ihre Eigenschaften, hilft komplexe biologische Prozesse und komplette biologische Systeme zu verstehen. Da in biologischen und chemischen Laboren täglich Daten anfallen, welche diese Moleküle beschreiben, existiert eine umfassende Datengrundlage, die sich kontinuierlich erweitert. Um Wissenschaftlern die Verarbeitung, den Austausch, die Archivierung und die Suche innerhalb dieser Informationen unter Erhaltung der semantischen Zusammenhänge zu ermöglichen, sind komplexe Softwaresysteme und Datenformate nötig. Das Ziel dieses Projektes bestand darin, Anwendungen und Algorithmen zu entwickeln, welche für die effiziente Kodierung, Sammlung, Normalisierung und Analyse molekularer Daten genutzt werden können. Diese sollen Wissenschaftler bei der Strukturaufklärung, der Dereplikation, der Analyse von molekularen Wechselwirkungen und bei der Veröffentlichung des so gewonnenen Wissens unterstützen. Da die direkte Beschreibung der Struktur und der Funktionsweise einer unbekannten Verbindung sehr schwierig und aufwändig ist, wird dies hauptsächlich indirekt, mit Hilfe beschreibender Eigenschaften erreicht. Diese werden dann zur Vorhersage struktureller und funktioneller Charakteristika genutzt. In diesem Zusammenhang wurden Programmmodule entwickelt, welche sowohl die Visualisierung von Struktur- und Spektroskopiedaten, die gegliederte Darstellung und Veränderung von Metadaten und Eigenschaften, als auch den Import und Export von verschiedenen Datenformaten erlauben. Diese wurden durch Methoden erweitert, welche es ermöglichen, die gewonnenen Informationen weitergehend zu analysieren und Struktur- und Spektroskopiedaten einander zuzuweisen. Außerdem wurde ein System zur strukturierten Archivierung und Verwaltung großer Mengen molekularer Daten und spektroskopischer Informationen, unter Beibehaltung der semantischen Zusammenhänge, sowohl im Dateisystem, als auch in Datenbanken, entwickelt. Um die verlustfreie Speicherung zu gewährleisten, wurde ein offenes und standardisiertes Datenformat definiert (CMLSpect). Dieses erweitert das existierende CML (Chemical Markup Language) Vokabular und erlaubt damit die einfache Handhabung von verknüpften Struktur- und Spektroskopiedaten. Die entwickelten Anwendungen wurden in das Bioclipse System für Bio- und Chemoinformatik eingebunden und bieten dem Nutzer damit eine hochqualitative Benutzeroberfläche und dem Entwickler eine leicht zu erweiternde modulare Programmarchitektur

Kölner UniversitätsPublikationsServer

Towards a Common Format for Computational Material Science Data

Author: Carbogno Christian
Ghiringhelli Luca M.
Huhs Georg
Levchenko Sergey
Lüders Martin
Mohamed Fawzi
Oliveira Micael
Scheffler Matthias
Publication venue
Publication date: 16/07/2016
Field of study

Preprint arXiv:1607.04738Information and data exchange is an important aspect of scientific progress. In computational materials science, a prerequisite for smooth data exchange is standardization, which means using agreed conventions for, e.g., units, zero base lines, and file formats. There are two main strategies to achieve this goal. One accepts the heterogeneous nature of the community which comprises scientists from physics, chemistry, bio-physics, and materials science, by complying with the diverse ecosystem of computer codes and thus develops “converters” for the input and output files of all important codes. These converters then translate the data of all important codes into a standardized, code-independent format. The other strategy is to provide standardized open libraries that code developers can adopt for shaping their inputs, outputs, and restart files, directly into the same code-independent format. We like to emphasize in this paper that these two strategies can and should be regarded as complementary, if not even synergetic. The main concepts and software developments of both strategies are very much identical, and, obviously, both approaches should give the same final result. In this paper, we present the appropriate format and conventions that were agreed upon by two teams, the Electronic Structure Library (ESL) of CECAM and the NOMAD (NOvel MAterials Discovery) Laboratory, a European Centre of Excellence (CoE). This discussion includes also the definition of hierarchical metadata describing state-of-the-art electronic-structure calculations.This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 676580, The NOMAD Laboratory, a European Center of Excellence, and the BBDC (contract 01IS14013E). We thank James Kermode and Saulius Gražulis for their contribution to the discussion on the metadata, and Pasquale Pavone for precious suggestions on the metadata structure and names. We thank Patrick Rinke for carefully reading the manuscript. We thank Claudia Draxl and Kristian Thygesen for their contribution to the discussions on the necessary information to be stored for excited-state calculations and on the error bars and uncertainties. We gratefully acknowledge Damien Caliste, Fabiano Corsetti, Hubert Ebert, Jan Minar, Yann Pouillon, Thomas Ruh, David Strubbe, and Marc Torrent for their contributions to the ESCDF specifications. We acknowledge inspiring discussions with Georg Kresse, Peter Blaha, Xavier Gonze, Bernard Delley, and Jörg Hutter on the energy-zero definition and scalar-field representation. We thank Ole Andersen, Evert Jan Baerends, Peter Blaha, Lambert Colin, Bernard Delley, Thierry Deutsch, Claudia Draxl, John Kay Dewhurst, Roberto Dovesi, Paolo Giannozzi, Mike Gillan, Xavier Gonze, Michael Frisch, Martin Head-Gordon, Juerg Hutter, Klaus Koepernik, Georg Kresse, Roland Lindh, Hans Lischka, Andrea Marini, Todd Martinez, Jens Jørgen Mortensen, Frank Neese, Richard Needs, Taisuke Ozaki, Mike Payne, Angel Rubio, Trond Saue, Chris Skylaris, Jose Soler, John Stanton, James Stewart, Marat Valiev for checking the information provided in Table 1 and for useful suggestions.Preprin

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Towards a Common Format for Computational Materials Science Data

Author: Carbogno C.
Ghiringhelli L.
Huhs G.
Levchenko S.
Lüders M.
Mohamed F.
Oliveira M.
Scheffler M.
Publication venue
Publication date: 01/07/2016
Field of study

MPG.PuRe

Towards Efficient Novel Materials Discovery

Author: Lenz-Himmer Maja-Olivia
Publication venue: Humboldt-Universität zu Berlin
Publication date: 25/01/2022
Field of study

Die Entdeckung von neuen Materialien mit speziellen funktionalen Eigenschaften ist eins der wichtigsten Ziele in den Materialwissenschaften. Das Screening des strukturellen und chemischen Phasenraums nach potentiellen neuen Materialkandidaten wird häufig durch den Einsatz von Hochdurchsatzmethoden erleichtert. Schnelle und genaue Berechnungen sind eins der Hauptwerkzeuge solcher Screenings, deren erster Schritt oft Geometrierelaxationen sind. In Teil I dieser Arbeit wird eine neue Methode der eingeschränkten Geometrierelaxation vorgestellt, welche die perfekte Symmetrie des Kristalls erhält, Resourcen spart sowie Relaxationen von metastabilen Phasen und Systemen mit lokalen Symmetrien und Verzerrungen erlaubt. Neben der Verbesserung solcher Berechnungen um den Materialraum schneller zu durchleuchten ist auch eine bessere Nutzung vorhandener Daten ein wichtiger Pfeiler zur Beschleunigung der Entdeckung neuer Materialien. Obwohl schon viele verschiedene Datenbanken für computerbasierte Materialdaten existieren ist die Nutzbarkeit abhängig von der Darstellung dieser Daten. Hier untersuchen wir inwiefern semantische Technologien und Graphdarstellungen die Annotation von Daten verbessern können. Verschiedene Ontologien und Wissensgraphen werden entwickelt anhand derer die semantische Darstellung von Kristallstrukturen, Materialeigenschaften sowie experimentellen Ergebenissen im Gebiet der heterogenen Katalyse ermöglicht werden. Wir diskutieren, wie der Ansatz Ontologien und Wissensgraphen zu separieren, zusammenbricht wenn neues Wissen mit künstlicher Intelligenz involviert ist. Eine Zwischenebene wird als Lösung vorgeschlagen. Die Ontologien bilden das Hintergrundwissen, welches als Grundlage von zukünftigen autonomen Agenten verwendet werden kann. Zusammenfassend ist es noch ein langer Weg bis Materialdaten für Maschinen verständlich gemacht werden können, so das der direkte Nutzen semantischer Technologien nach aktuellem Stand in den Materialwissenschaften sehr limitiert ist.The discovery of novel materials with specific functional properties is one of the highest goals in materials science. Screening the structural and chemical space for potential new material candidates is often facilitated by high-throughput methods. Fast and still precise computations are a main tool for such screenings and often start with a geometry relaxation to find the nearest low-energy configuration relative to the input structure. In part I of this work, a new constrained geometry relaxation is presented which maintains the perfect symmetry of a crystal, saves time and resources as well as enables relaxations of meta-stable phases and systems with local symmetries or distortions. Apart from improving such computations for a quicker screening of the materials space, better usage of existing data is another pillar that can accelerate novel materials discovery. While many different databases exists that make computational results accessible, their usability depends largely on how the data is presented. We here investigate how semantic technologies and graph representations can improve data annotation. A number of different ontologies and knowledge graphs are developed enabling the semantic representation of crystal structures, materials properties as well experimental results in the field of heterogeneous catalysis. We discuss the breakdown of the knowledge-graph approach when knowledge is created using artificial intelligence and propose an intermediate information layer. The underlying ontologies can provide background knowledge for possible autonomous intelligent agents in the future. We conclude that making materials science data understandable to machines is still a long way to go and the usefulness of semantic technologies in the domain of materials science is at the moment very limited

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

MPG.PuRe