29 research outputs found

    FAIR Data Model for Chemical Substances: Development Challenges, Management Strategies, and Applications

    Get PDF
    Data models for representation of chemicals are at the core of cheminformatics processing workflows. The standard triple, (structure, properties, and descriptors), traditionally formalizes a molecule and has been the dominant paradigm for several decades. While this approach is useful and widely adopted from academia, the regulatory bodies and industry have complex use cases and impose the concept of chemical substances applied for multicomponent, advanced, and nanomaterials. Chemical substance data model is an extension of the molecule representation and takes into account the practical aspects of chemical data management, emerging research challenges and discussions within academia, industry, and regulators. The substance paradigm must handle a composition of multiple components. Mandatory metadata is packed together with the experimental and theoretical data. Data model elucidation poses challenges regarding metadata, ontology utilization, and adoption of FAIR principles. We illustrate the adoption of these good practices by means of the Ambit/eNanoMapper data model, which is applied for chemical substances originating from ECHA REACH dossiers and for largest nanosafety database in Europe. The Ambit/eNanoMapper model allows development of tools for data curation, FAIRification of large collections of nanosafety data, ontology annotation, data conversion to standards such as JSON, RDF, and HDF5, and emerging linear notations for chemical substances

    The eNanoMapper database for nanomaterial safety information

    Get PDF
    Background: The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. Results: The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. Conclusion: We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the “representational state transfer” (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure–activity relationships for nanomaterials (NanoQSAR)

    Representing and describing nanomaterials in predictive nanoinformatics

    Get PDF
    This Review discusses how a comprehensive system for defining nanomaterial descriptors can enable a safe-and-sustainable-by-design concept for engineered nanomaterials. Engineered nanomaterials (ENMs) enable new and enhanced products and devices in which matter can be controlled at a near-atomic scale (in the range of 1 to 100 nm). However, the unique nanoscale properties that make ENMs attractive may result in as yet poorly known risks to human health and the environment. Thus, new ENMs should be designed in line with the idea of safe-and-sustainable-by-design (SSbD). The biological activity of ENMs is closely related to their physicochemical characteristics, changes in these characteristics may therefore cause changes in the ENMs activity. In this sense, a set of physicochemical characteristics (for example, chemical composition, crystal structure, size, shape, surface structure) creates a unique 'representation' of a given ENM. The usability of these characteristics or nanomaterial descriptors (nanodescriptors) in nanoinformatics methods such as quantitative structure-activity/property relationship (QSAR/QSPR) models, provides exciting opportunities to optimize ENMs at the design stage by improving their functionality and minimizing unforeseen health/environmental hazards. A computational screening of possible versions of novel ENMs would return optimal nanostructures and manage ('design out') hazardous features at the earliest possible manufacturing step. Safe adoption of ENMs on a vast scale will depend on the successful integration of the entire bulk of nanodescriptors extracted experimentally with data from theoretical and computational models. This Review discusses directions for developing appropriate nanomaterial representations and related nanodescriptors to enhance the reliability of computational modelling utilized in designing safer and more sustainable ENMs.Peer reviewe

    Your Spreadsheets Can Be FAIR: A Tool and FAIRification Workflow for the eNanoMapper Database

    No full text
    The field of nanoinformatics is rapidly developing and provides data driven solutions in the area of nanomaterials (NM) safety. Safe by Design approaches are encouraged and promoted through regulatory initiatives and multiple scientific projects. Experimental data is at the core of nanoinformatics processing workflows for risk assessment. The nanosafety data is predominantly recorded in Excel spreadsheet files. Although the spreadsheets are quite convenient for the experimentalists, they also pose great challenges for the consequent processing into databases due to variability of the templates used, specific details provided by each laboratory and the need for proper metadata documentation and formatting. In this paper, we present a workflow to facilitate the conversion of spreadsheets into a FAIR (Findable, Accessible, Interoperable, and Reusable) database, with the pivotal aid of the NMDataParser tool, developed to streamline the mapping of the original file layout into the eNanoMapper semantic data model. The NMDataParser is an open source Java library and application, making use of a JSON configuration to define the mapping. We describe the JSON configuration syntax and the approaches applied for parsing different spreadsheet layouts used by the nanosafety community. Examples of using the NMDataParser tool in nanoinformatics workflows are given. Challenging cases are discussed and appropriate solutions are proposed

    Integrated modernization of the gas-and-air system of a turbocharged diesel engine (21/21)

    Get PDF
    Improving the exploitative and environmental performance of piston engines (PICE) is an urgent task for many engineers and scientists. The article presents the results of the upgrade of a gas-and-air system of a diesel PICE, carried out through changing the turbocharging system’s configuration and modernizing the design of the admittance collector. The authors present a review of studies on the given subject and a description of the object of the research. The study was conducted on the basis of bench tests at a manufacturing plant and mathematical modeling using ACTUS program. The results of experimental studies on the main indicators of a basic and upgraded PICEs are presented. The gas exchange processes in the PICE under examination were studied in detail using mathematical modeling. For the given diesel PICE, improvement of the gas-and-air system leads to a growth in charging efficiency by 2.45-3.92%, a decrease in scavenging factor by 3.11-6.31% and a reduction of specific fuel consumption up to 3.33%. In the conclusion, new directions for increasing the efficiency of the given PICE are offered

    Ambit-SMIRKS: a software module for reaction representation, reaction search and structure transformation

    No full text
    Abstract Ambit-SMIRKS is an open source software, enabling structure transformation via the SMIRKS language and implemented as an extension of Ambit-SMARTS. As part of the Ambit project it builds on top of The Chemistry Development Kit (The CDK). Ambit-SMIRKS provides the following functionalities: parsing of SMIRKS linear notations into internal reaction (transformation) representations based on The CDK objects, application of the stored reactions against target (reactant) molecules for actual transformation of the target chemical objects, reaction searching, stereo information handling, product post-processing, etc. The transformations can be applied on various sites of the reactant molecule in several modes: single, non-overlapping, non-identical, non-homomorphic or externally specified list of sites utilizing efficient substructure searching algorithm. Ambit-SMIRKS handles the molecules stereo information and supports basic chemical stereo elements implemented in The CDK library. The full SMARTS logical expressions syntax for reactions specification is supported, including recursive SMARTS expressions as well as additional syntax extensions. Since its initial development for the purpose of metabolite generation within Toxtree, the Ambit-SMIRKS module was used in various chemoinformatics projects, both developed by the authors of the package and by external teams. We show several use cases of the Ambit-SMIRKS software including standardization of large chemical databases and pathway transformation database and prediction. Ambit-SMIRKS is distributed as a Java library under LGPL license. More information on use cases and applications, including download links is available at http://ambit.sourceforge.net/smirks

    RetroTransformDB: A Dataset of Generic Transforms for Retrosynthetic Analysis

    No full text
    Presently, software tools for retrosynthetic analysis are widely used by organic, medicinal, and computational chemists. Rule-based systems extensively use collections of retro-reactions (transforms). While there are many public datasets with reactions in synthetic direction (usually non-generic reactions), there are no publicly-available databases with generic reactions in computer-readable format which can be used for the purposes of retrosynthetic analysis. Here we present RetroTransformDB—a dataset of transforms, compiled and coded in SMIRKS line notation by us. The collection is comprised of more than 100 records, with each one including the reaction name, SMIRKS linear notation, the functional group to be obtained, and the transform type classification. All SMIRKS transforms were tested syntactically, semantically, and from a chemical point of view in different software platforms. The overall dataset design and the retrosynthetic fitness were analyzed and curated by organic chemistry experts. The RetroTransformDB dataset may be used by open-source and commercial software packages, as well as chemoinformatics tools

    Making the data available with AMBIT chemoinformatics platform

    No full text
    <p>Presented at Molecular Informatics Open Source Software (MIOSS) , EBI industry workshop, Hinxton, UK, May 18-19, 2016</p
    corecore