Search CORE

CML: Evolution and design.

Author: Murray-Rust Peter
Rzepa Henry S
Publication venue: J Cheminform
Publication date: 01/01/2011
Field of study

A retrospective view of the design and evolution of Chemical Markup Language (CML) is presented by its original authors.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

Spiral - Imperial College Digital Repository

Recommended from our members

From database to knowledge graph — using data in chemistry

Author: Kraft M
Krdzavac NB
Menon A
Publication venue: Current Opinion in Chemical Engineering
Publication date: 01/01/2019
Field of study

Recommended from our members

Information extraction from chemical patents

Author: Jessop David M
Publication venue: University of Cambridge
Publication date: 15/03/2011
Field of study

The automated extraction of semantic chemical data from the existing literature is demonstrated. For reasons of copyright, the work is focused on the patent literature, though the methods are expected to apply equally to other areas of the chemical literature. Hearst Patterns are applied to the patent literature in order to discover hyponymic relations describing chemical species. The acquired relations are manually validated to determine the precision of the determined hypernyms (85.0%) and of the asserted hyponymic relations (94.3%). It is demonstrated that the system acquires relations that are not present in the ChEBI ontology, suggesting that it could function as a valuable aid to the ChEBI curators. The relations discovered by this process are formalised using the Web Ontology Language (OWL) to enable re-use. PatentEye – an automated system for the extraction of reactions from chemical patents and their conversion to Chemical Markup Language (CML) – is presented. Chemical patents published by the European Patent Office over a ten-week period are used to demonstrate the capability of PatentEye – 4444 reactions are extracted with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra are extracted from the text using OSCAR3, which is developed to greatly increase recall. The resulting system is presented as a significant advancement towards the large-scale and automated extraction of high-quality reaction information. Extended Polymer Markup Language (EPML), a CML dialect for the description of Markush structures as they are presented in the literature, is developed. Software to exemplify and to enable substructure searching of EPML documents is presented. Further work is recommended to refine the language and code to publication-quality before they are presented to the community.Unileve

The semantics of Chemical Markup Language (CML) for computational chemistry : CompChem.

Author: Kraft Markus
Murray-Rust Peter
Phadungsukanan Weerapong
Townsend Joe A
Publication venue: J Cheminform
Publication date: 07/08/2012
Field of study

: This paper introduces a subdomain chemistry format for storing computational chemistry data called CompChem. It has been developed based on the design, concepts and methodologies of Chemical Markup Language (CML) by adding computational chemistry semantics on top of the CML Schema. The format allows a wide range of ab initio quantum chemistry calculations of individual molecules to be stored. These calculations include, for example, single point energy calculation, molecular geometry optimization, and vibrational frequency analysis. The paper also describes the supporting infrastructure, such as processing software, dictionaries, validation tools and database repositories. In addition, some of the challenges and difficulties in developing common computational chemistry dictionaries are discussed. The uses of CompChem are illustrated by two practical applications.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

The semantics of Chemical Markup Language (CML): dictionaries and conventions

Author: Adams Sam E
Murray-Rust Peter
Phadungsukanan Weerapong
Thomas Jens
Townsend Joe A
Publication venue
Publication date: 01/01/2011
Field of study

RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs.Peer Reviewe

Directory of Open Access Journals

ePubs: the open archive for STFC research publications

Maastricht University Research Portal

XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services

Author: A Labarga
AR Jones
B Wallner
BioMoby Consortium
C Steinbeck
C Steinbeck
D Smedley
E Jain
E Willighagen
Egon L Willighagen
EW Sayers
GL Holliday
H Stockinger
H Sugawara
Jarl ES Wikberg
Johannes Wagener
L Stein
LM Vaquero
M Hucka
M Lapins
MA Larkin
MD Wilkinson
MWEJ Fiers
N Adams
O Spjuth
Ola Spjuth
P Fisher
P Murray-Rust
PBT Neerincx
R Kottmann
RD Dowell
S Hoon
S Hunter
S Kaarthik
S Kerrien
S Kuhn
S Miyazaki
T Oinn
UniProt Consortium
X Dong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. Results: We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. Conclusion: XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics

Crossref

Ghent University Academic Bibliography

Open Access LMU

DRIVER Technology Watch Report

Author: Hochstenbach Patrick
Karstens Elbaek Mikael
Russell Rosemary
Schmelz Pedersen Gerd
Van Godtsenhoven Karen
Vanderfeesten Maurice
Publication venue: DRIVER project
Publication date: 01/01/2008
Field of study

This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field

Analysis and Synthesis of Metadata Goals for Scientific Data

Author: Bain
Baker
Blank
Bountouri
Bosch
Brazma
Bruce
Buschmann
Committee on Science Engineering, and Public Policy (US), and Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age
Consultative Committee for Space Data Systems (CCSDS)
Duval
Frenkel
Garvey
Greenberg
Greenberg
Greenberg
Greenberg
Hall
Hall
Heidorn
Hey
Higgins
Hjørland
Hubenthal
Jones
Kelling
Klein
Krippendorff
Lide
Lim
Michener
Murray-Rust
National Science Foundation
NSF Task Force on Cyberlearning
Rayner
Ryssevik
Sommerville
Spellman
Spurgin
Stvilia
Westbrook
Westbrook
Zhang
Publication venue: Duke University School of Law
Publication date: 01/01/2012
Field of study

The proliferation of discipline-specific metadata schemes contributes to artificial barriers that can impede interdisciplinary and transdisciplinary research. The authors considered this problem by examining the domains, objectives, and architectures of nine metadata schemes used to document scientific data in the physical, life, and social sciences. They used a mixed-methods content analysis and Greenberg’s (2005) metadata objectives, principles, domains, and architectural layout (MODAL) framework, and derived 22 metadata-related goals from textual content describing each metadata scheme. Relationships are identified between the domains (e.g., scientific discipline and type of data) and the categories of scheme objectives. For each strong correlation (\u3e0.6), a Fisher’s exact test for nonparametric data was used to determine significance (p \u3c .05). Significant relationships were found between the domains and objectives of the schemes. Schemes describing observational data are more likely to have “scheme harmonization” (compatibility and interoperability with related schemes) as an objective; schemes with the objective “abstraction” (a conceptual model exists separate from the technical implementation) also have the objective “sufficiency” (the scheme defines a minimal amount of information to meet the needs of the community); and schemes with the objective “data publication” do not have the objective “element refinement.” The analysis indicates that many metadata-driven goals expressed by communities are independent of scientific discipline or the type of data, although they are constrained by historical community practices and workflows as well as the technological environment at the time of scheme creation. The analysis reveals 11 fundamental metadata goals for metadata documenting scientific data in support of sharing research data across disciplines and domains. The authors report these results and highlight the need for more metadata-related research, particularly in the context of recent funding agency policy changes

Publikationer från KTH

Crossref

Duke Law Scholarship Repository

Digitala Vetenskapliga Arkivet - Academic Archive On-line

espace@Curtin

Recommended from our members

Automatic analysis and validation of open polymer data

Author: England Nicholas William
Publication venue: University of Cambridge
Publication date: 15/03/2011
Field of study

A system to automatically extract, analyse, validate and model polymer data has been produced. This system is called the Polymer Informatics Knowledge System (PIKS). Methods of storing polymer data electronically are examined. The majority of data-formats are only capable of representing an idealised structure of a macromolecule rather than the actual distribution of structures present in the polymer. Polymer markup language (PML) is the only data-format capable of storing this information. A novel extension to the PML language, allowing copolymers produced with a depletion of reactants is introduced. Without the extension only Markov-chains can be produced. An informatics analysis of Unilever data of cleaning efficacy of polymers is performed. A representative macromolecule was produced for each polymer sample. Descriptors were calculated over these and used for machine learning to predict the cleaning efficacy. From these models a monomer was identified which was very strongly correlated with good cleaning performance. The monomer in question cannot be revealed as it is a trade secret. Polymer data from the PoLyInfo database are extracted and converted into XML. A summary of the data available in the PoLyInfo Database is presented. The PIKS tools were used to automatically validate this data for internal consistency, as well as against another data source. The monomers and polymers were analysed for consistency, as well as CML reactions being produced for the polymerisation reactions in the database which were also checked for constancy. The error in the structures was found to be 5.8% for the monomers, 7.3% for the polymers and 2.9% for the reactions. Some of the causes of the discrepancies are presented. The property data from the PoLyInfo database was then used for machine learning. Support Vector Regression (SVR) models of the glass transition temperature were produced both with and without the inclusion of sample characterisation data. Both methods performed similarly, with the model without producing an RMS error of 19.1K (r^2=0.96), while the model with produced an RMS error of 20.1K (r^2=0.96). This means that more sample characterisation data is required than the M_w and M_w/M_n.This work was supported by Unileve