649 research outputs found

    Engineering polymer informatics: Towards the computer-aided design of polymers

    Get PDF
    The computer-aided design of polymers is one of the holy grails of modern chemical informatics and of significant interest for a number of communities in polymer science. The paper outlines a vision for the in silico design of polymers and presents an information model for polymers based on modern semantic web technologies, thus laying the foundations for achieving the vision

    Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences

    No full text
    Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information

    Ontology and Knowledge Base of Brittle Deformation Microstructures for the San Andreas Fault Observatory at Depth (SAFOD) Core Samples

    Get PDF
    The quest to answer fundamental questions and solve complex problems is a principal tenet of Earth science. The pursuit of scientific knowledge has generated profuse research, resulting in a plethora of information-rich resources. This phenomenon offers great potential for scientific discovery. However, a deficiency in information connectivity and processing standards has become evident. This deficiency has resulted in a demand for tools to facilitate and process this upsurge in information. This ontology project is an answer to the demand for information processing tools. The primary purpose of this domain-specific ontology and knowledge base is to organize, connect, and correlate research data related to brittle deformation microstructures. This semantically enabled ontology may be queried to return not only asserted information, but inferred knowledge that may not be evident. In addition, its standardized development in OWL-DL (Web Ontology Language-Description Logic) allows the potential for sharing and reuse among other geologic science communities

    CMLLite: a design philosophy for CML.

    Get PDF
    CMLLite is a collection of definitions and processes which provide strong and flexible validation for a document in Chemical Markup Language (CML). It consists of an updated CML schema (schema3), conventions specifying rules in both human and machine-understandable forms and a validator available both online and offline to check conformance. This article explores the rationale behind the changes which have been made to the schema, explains how conventions interact and how they are designed, formulated, implemented and tested, and gives an overview of the validation service.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Metal Cations in Protein Force Fields: From Data Set Creation and Benchmarks to Polarizable Force Field Implementation and Adjustment

    Get PDF
    Metal cations are essential to life. About one-third of all proteins require metal cofactors to accurately fold or to function. Computer simulations using empirical parameters and classical molecular mechanics models (force fields) are the standard tool to investigate proteins’ structural dynamics and functions in silico. Despite many successes, the accuracy of force fields is limited when cations are involved. The focus of this thesis is the development of tools and strategies to create system-specific force field parameters to accurately describe cation-protein interactions. The accuracy of a force field mainly relies on (i) the parameters derived from increasingly large quantum chemistry or experimental data and (ii) the physics behind the energy formula. The first part of this thesis presents a large and comprehensive quantum chemistry data set on a consistent computational footing that can be used for force field parameterization and benchmarking. The data set covers dipeptides of the 20 proteinogenic amino acids with different possible side chain protonation states, 3 divalent cations (Ca2+, Mg2+, and Ba2+), and a wide relative energy range. Crucial properties related to force field development, such as partial charges, interaction energies, etc., are also provided. To make the data available, the data set was uploaded to the NOMAD repository and its data structure was formalized in an ontology. Besides a proper data basis for parameterization, the physics covered by the terms of the additive force field formulation model impacts its applicability. The second part of this thesis benchmarks three popular non-polarizable force fields and the polarizable Drude model against a quantum chemistry data set. After some adjustments, the Drude model was found to reproduce the reference interaction energy substantially better than the non-polarizable force fields, which showed the importance of explicitly addressing polarization effects. Tweaking of the Drude model involved Boltzmann-weighted fitting to optimize Thole factors and Lennard-Jones parameters. The obtained parameters were validated by (i) their ability to reproduce reference interaction energies and (ii) molecular dynamics simulations of the N-lobe of calmodulin. This work facilitates the improvement of polarizable force fields for cation-protein interactions by quantum chemistry-driven parameterization combined with molecular dynamics simulations in the condensed phase. While the Drude model exhibits its potential simulating cation-protein interactions, it lacks description of charge transfer effects, which are significant between cation and protein. The CTPOL model extends the classical force field formulation by charge transfer (CT) and polarization (POL). Since the CTPOL model is not readily available in any of the popular molecular-dynamics packages, it was implemented in OpenMM. Furthermore, an open-source parameterization tool, called FFAFFURR, was implemented that enables the (system specific) parameterization of OPLS-AA and CTPOL models. Following the method established in the previous part, the performance of FFAFFURR was evaluated by its ability to reproduce quantum chemistry energies and molecular dynamics simulations of the zinc finger protein. In conclusion, this thesis steps towards the development of next-generation force fields to accurately describe cation-protein interactions by providing (i) reference data, (ii) a force field model that includes charge transfer and polarization, and (iii) a freely-available parameterization tool.Metallkationen sind für das Leben unerlässlich. Etwa ein Drittel aller Proteine benötigen Metall-Cofaktoren, um sich korrekt zu falten oder zu funktionieren. Computersimulationen unter Verwendung empirischer Parameter und klassischer Molekülmechanik-Modelle (Kraftfelder) sind ein Standardwerkzeug zur Untersuchung der strukturellen Dynamik und Funktionen von Proteinen in silico. Trotz vieler Erfolge ist die Genauigkeit der Kraftfelder begrenzt, wenn Kationen beteiligt sind. Der Schwerpunkt dieser Arbeit liegt auf der Entwicklung von Werkzeugen und Strategien zur Erstellung systemspezifischer Kraftfeldparameter zur genaueren Beschreibung von Kationen-Protein-Wechselwirkungen. Die Genauigkeit eines Kraftfelds hängt hauptsächlich von (i) den Parametern ab, die aus immer größeren quantenchemischen oder experimentellen Daten abgeleitet werden, und (ii) der Physik hinter der Kraftfeld-Formel. Im ersten Teil dieser Arbeit wird ein großer und umfassender quantenchemischer Datensatz auf einer konsistenten rechnerischen Grundlage vorgestellt, der für die Parametrisierung und das Benchmarking von Kraftfeldern verwendet werden kann. Der Datensatz umfasst Dipeptide der 20 proteinogenen Aminosäuren mit verschiedenen möglichen Seitenketten-Protonierungszuständen, 3 zweiwertige Kationen (Ca2+, Mg2+ und Ba2+) und einen breiten relativen Energiebereich. Wichtige Eigenschaften für die Entwicklung von Kraftfeldern, wie Wechselwirkungsenergien, Partialladungen usw., werden ebenfalls bereitgestellt. Um die Daten verfügbar zu machen, wurde der Datensatz in das NOMAD-Repository hochgeladen und seine Datenstruktur wurde in einer Ontologie formalisiert. Neben einer geeigneten Datenbasis für die Parametrisierung beeinflusst die Physik, die von den Termen des additiven Kraftfeld-Modells abgedeckt wird, dessen Anwendbarkeit. Der zweite Teil dieser Arbeit vergleicht drei populäre nichtpolarisierbare Kraftfelder und das polarisierbare Drude-Modell mit einem Datensatz aus der Quantenchemie. Nach einigen Anpassungen stellte sich heraus, dass das Drude-Modell die Referenzwechselwirkungsenergie wesentlich besser reproduziert als die nichtpolarisierbaren Kraftfelder, was zeigt, wie wichtig es ist, Polarisationseffekte explizit zu berücksichtigen. Die Anpassung des Drude-Modells umfasste eine Boltzmann-gewichtete Optimierung der Thole-Faktoren und Lennard-Jones-Parameter. Die erhaltenen Parameter wurden validiert durch (i) ihre Fähigkeit, Referenzwechselwirkungsenergien zu reproduzieren und (ii) Molekulardynamik-Simulationen des Calmodulin-N-Lobe. Diese Arbeit demonstriert die Verbesserung polarisierbarer Kraftfelder für Kationen-Protein-Wechselwirkungen durch quantenchemisch gesteuerte Parametrisierung in Kombination mit Molekulardynamiksimulationen in der kondensierten Phase. Während das Drude-Modell sein Potenzial bei der Simulation von Kation - Protein - Wechselwirkungen zeigt, fehlt ihm die Beschreibung von Ladungstransfereffekten, die zwischen Kation und Protein von Bedeutung sind. Das CTPOL-Modell erweitert die klassische Kraftfeldformulierung um den Ladungstransfer (CT) und die Polarisation (POL). Da das CTPOL-Modell in keinem der gängigen Molekulardynamik-Pakete verfügbar ist, wurde es in OpenMM implementiert. Außerdem wurde ein Open-Source-Parametrisierungswerkzeug namens FFAFFURR implementiert, welches die (systemspezifische) Parametrisierung von OPLS-AA und CTPOL-Modellen ermöglicht. In Anlehnung an die im vorangegangenen Teil etablierte Methode wurde die Leistung von FFAFFURR anhand seiner Fähigkeit, quantenchemische Energien und Molekulardynamiksimulationen des Zinkfingerproteins zu reproduzieren, bewertet. Zusammenfassend lässt sich sagen, dass diese Arbeit einen Schritt in Richtung der Entwicklung von Kraftfeldern der nächsten Generation zur genauen Beschreibung von Kationen-Protein-Wechselwirkungen darstellt, indem sie (i) Referenzdaten, (ii) ein Kraftfeldmodell, das Ladungstransfer und Polarisation einschließt, und (iii) ein frei verfügbares Parametrisierungswerkzeug bereitstellt

    A Web3D Enabled Information Integration Framework for Facility Management

    Full text link
    Managing capital oil and gas and civil engineering facilities requires a large amount of heterogeneous information that is generated by different project stakeholders across the facility lifecycle phases and is stored in various databases and technical documents. The amount of information reaches its peak during the commissioning and handover phases when the project is handed over to the operator. The operational phase of facilities spans multiple decades and the way facilities are used and maintained have a huge impact on costs, environment, productivity, and health and safety. Thus, the client and the operator bear most of the additional costs associated with incomplete, incorrect or not immediately usable information. Web applications can provide quick and convenient access to information regardless of user location. However, the integration and delivery of engineering information, including 3D content, over the Web is still at its infancy and is affected by numerous technical (i.e. data and tools) and procedural (i.e. process and people) challenges. This paper addresses the technical issues and proposes a Web3D enabled information integration framework that delivers engineering information together with 3D content without any plug-ins. In the proposed framework, a class library defines the engineering data requirements and a semi-structured database provides means to integrate heterogeneous technical asset information. This framework also enables separating the 3D model content into fragments, storing them together with the digital assets and delivering to the client browser on demand. Such framework partially alleviates the current limitations of the JavaScript based 3D content delivery such as application speed and latency. Hence, the proposed framework is particularly valuable to petroleum and civil engineering companies working with large amounts of data

    In-silico-Systemanalyse von Biopathways

    Get PDF
    Chen M. In silico systems analysis of biopathways. Bielefeld (Germany): Bielefeld University; 2004.In the past decade with the advent of high-throughput technologies, biology has migrated from a descriptive science to a predictive one. A vast amount of information on the metabolism have been produced; a number of specific genetic/metabolic databases and computational systems have been developed, which makes it possible for biologists to perform in silico analysis of metabolism. With experimental data from laboratory, biologists wish to systematically conduct their analysis with an easy-to-use computational system. One major task is to implement molecular information systems that will allow to integrate different molecular database systems, and to design analysis tools (e.g. simulators of complex metabolic reactions). Three key problems are involved: 1) Modeling and simulation of biological processes; 2) Reconstruction of metabolic pathways, leading to predictions about the integrated function of the network; and 3) Comparison of metabolism, providing an important way to reveal the functional relationship between a set of metabolic pathways. This dissertation addresses these problems of in silico systems analysis of biopathways. We developed a software system to integrate the access to different databases, and exploited the Petri net methodology to model and simulate metabolic networks in cells. It develops a computer modeling and simulation technique based on Petri net methodology; investigates metabolic networks at a system level; proposes a markup language for biological data interchange among diverse biological simulators and Petri net tools; establishes a web-based information retrieval system for metabolic pathway prediction; presents an algorithm for metabolic pathway alignment; recommends a nomenclature of cellular signal transduction; and attempts to standardize the representation of biological pathways. Hybrid Petri net methodology is exploited to model metabolic networks. Kinetic modeling strategy and Petri net modeling algorithm are applied to perform the processes of elements functioning and model analysis. The proposed methodology can be used for all other metabolic networks or the virtual cell metabolism. Moreover, perspectives of Petri net modeling and simulation of metabolic networks are outlined. A proposal for the Biology Petri Net Markup Language (BioPNML) is presented. The concepts and terminology of the interchange format, as well as its syntax (which is based on XML) are introduced. BioPNML is designed to provide a starting point for the development of a standard interchange format for Bioinformatics and Petri nets. The language makes it possible to exchange biology Petri net diagrams between all supported hardware platforms and versions. It is also designed to associate Petri net models and other known metabolic simulators. A web-based metabolic information retrieval system, PathAligner, is developed in order to predict metabolic pathways from rudimentary elements of pathways. It extracts metabolic information from biological databases via the Internet, and builds metabolic pathways with data sources of genes, sequences, enzymes, metabolites, etc. The system also provides a navigation platform to investigate metabolic related information, and transforms the output data into XML files for further modeling and simulation of the reconstructed pathway. An alignment algorithm to compare the similarity between metabolic pathways is presented. A new definition of the metabolic pathway is proposed. The pathway defined as a linear event sequence is practical for our alignment algorithm. The algorithm is based on strip scoring the similarity of 4-hierarchical EC numbers involved in the pathways. The algorithm described has been implemented and is in current use in the context of the PathAligner system. Furthermore, new methods for the classification and nomenclature of cellular signal transductions are recommended. For each type of characterized signal transduction, a unique ST number is provided. The Signal Transduction Classification Database (STCDB), based on the proposed classification and nomenclature, has been established. By merging the ST numbers with EC numbers, alignments of biopathways are possible. Finally, a detailed model of urea cycle that includes gene regulatory networks, metabolic pathways and signal transduction is demonstrated by using our approaches. A system biological interpretation of the observed behavior of the urea cycle and its related transcriptomics information is proposed to provide new insights for metabolic engineering and medical care

    Proceedings of the 6th Dutch-Belgian Information Retrieval Workshop

    Get PDF

    Encoding, Storing and Searching of Analytical Properties and Assigned Metabolite Structures

    Get PDF
    Informationen über Metabolite und andere kleine organische Moleküle sind von entscheidender Bedeutung in vielen verschiedenen Bereichen der Naturwissenschaften. Sie spielen z.B. eine entscheidende Rolle in metabolischen Netzwerken und das Wissen über ihre Eigenschaften, hilft komplexe biologische Prozesse und komplette biologische Systeme zu verstehen. Da in biologischen und chemischen Laboren täglich Daten anfallen, welche diese Moleküle beschreiben, existiert eine umfassende Datengrundlage, die sich kontinuierlich erweitert. Um Wissenschaftlern die Verarbeitung, den Austausch, die Archivierung und die Suche innerhalb dieser Informationen unter Erhaltung der semantischen Zusammenhänge zu ermöglichen, sind komplexe Softwaresysteme und Datenformate nötig. Das Ziel dieses Projektes bestand darin, Anwendungen und Algorithmen zu entwickeln, welche für die effiziente Kodierung, Sammlung, Normalisierung und Analyse molekularer Daten genutzt werden können. Diese sollen Wissenschaftler bei der Strukturaufklärung, der Dereplikation, der Analyse von molekularen Wechselwirkungen und bei der Veröffentlichung des so gewonnenen Wissens unterstützen. Da die direkte Beschreibung der Struktur und der Funktionsweise einer unbekannten Verbindung sehr schwierig und aufwändig ist, wird dies hauptsächlich indirekt, mit Hilfe beschreibender Eigenschaften erreicht. Diese werden dann zur Vorhersage struktureller und funktioneller Charakteristika genutzt. In diesem Zusammenhang wurden Programmmodule entwickelt, welche sowohl die Visualisierung von Struktur- und Spektroskopiedaten, die gegliederte Darstellung und Veränderung von Metadaten und Eigenschaften, als auch den Import und Export von verschiedenen Datenformaten erlauben. Diese wurden durch Methoden erweitert, welche es ermöglichen, die gewonnenen Informationen weitergehend zu analysieren und Struktur- und Spektroskopiedaten einander zuzuweisen. Außerdem wurde ein System zur strukturierten Archivierung und Verwaltung großer Mengen molekularer Daten und spektroskopischer Informationen, unter Beibehaltung der semantischen Zusammenhänge, sowohl im Dateisystem, als auch in Datenbanken, entwickelt. Um die verlustfreie Speicherung zu gewährleisten, wurde ein offenes und standardisiertes Datenformat definiert (CMLSpect). Dieses erweitert das existierende CML (Chemical Markup Language) Vokabular und erlaubt damit die einfache Handhabung von verknüpften Struktur- und Spektroskopiedaten. Die entwickelten Anwendungen wurden in das Bioclipse System für Bio- und Chemoinformatik eingebunden und bieten dem Nutzer damit eine hochqualitative Benutzeroberfläche und dem Entwickler eine leicht zu erweiternde modulare Programmarchitektur

    Qualitätskontrolle mittels semantischer Technologien in digitalen Bibliotheken

    Get PDF
    Controlled content quality especially in terms of indexing is one of the major ad-vantages of using digital libraries in contrast to general Web sources or Web search engines. Therefore, more and more digital libraries offer corpora related to a specialized domain. Beyond simple keyword based searches the resulting infor-mation systems often rely on entity centered searches. For being able to offer this kind of search, a high quality document processing is essential. However, considering today’s information flood the mostly manual effort in ac-quiring new sources and creating suitable (semantic) metadata for content indexing and retrieval is already prohibitive. A recent solution is given by automatic genera-tion of metadata, where mostly statistical techniques like e.g. document classifica-tion and entity extraction currently become more widespread. But in this case neglecting quality assurance is even more problematic, because heuristic genera-tion often fails and the resulting low-quality metadata will directly diminish the quality of service that a digital library provides. Thus, the quality assessment of information system’s metadata annotations used for subsequent querying of collections has to be enabled. In this thesis we discuss the importance of metadata quality assessment for information systems and the benefits gained from controlled and guaranteed quality.Eine kontrollierte Qualität der Metadaten ist einer der wichtigsten Vorteile bei der Verwendung von digitalen Bibliotheken im Vergleich zu Web Suchmaschinen. Auf diesen hochqualitativen Inhalten werden immer mehr fachspezifische Portale durch die digitalen Bibliotheken erzeugt. Die so entstehenden Informationssysteme bieten oftmals neben einer simplen Stichwortsuche auch Objekt zentrierte Suchen an. Um solch eine Objekt-Suche zu ermöglichen, ist aber eine hochqualitative Verarbeitung der zugrunde liegenden Dokumente notwendig. Betrachtet man hingegen die heutige Informationsflut, so stellt man fest, dass der Aufwand für eine manuelle Erschließung von neuen Quellen und die Erzeugung von (semantischen) Metadaten für die Indexierung schon heute unerschwinglich ist. Eine aktuelle Lösung für dieses Problem ist die zumeist automatische Erzeugung von (semantischen) Metadaten, durch statistische Methoden, wie die automatische Dokumenten Klassifizierung Entitäten Extraktion. Aber bei der Verwendung sol-cher Methoden ist die Vernachlässigung der Qualität noch problematischer, da eine heuristische Erzeugung oftmals fehlerbehaftet ist. Diese schlechte Qualität der so erzeugten Metadaten wird dabei direkt die Servicequalität einer digitalen Biblio-thek herabmindern. Somit muss eine Qualitätsbewertung der Metadaten garantiert werden. In dieser Arbeit diskutieren wir die Bedeutung von Metadaten Qualität für Digitale Bibliotheken und die Chancen die aus kontrollierter und garantierter Qua-lität gewonnen werden können
    • …
    corecore