1,390 research outputs found

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

    Get PDF
    The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

    Leveraging 3D chemical similarity, target and phenotypic data in the identification of drug-protein and drug-adverse effect associations

    Get PDF
    Additional file 5: Figure S4. Number of side effects and targets for each drug in the target-phenotype model

    The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

    Get PDF
    open access articleBackground: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software

    ChemScanner: extraction and re-use(ability) of chemical information from common scientific documents containing ChemDraw files

    Get PDF
    We developed ChemScanner, a software that can be used for the extraction of chemical information from ChemDraw binary (CDX) or ChemDraw XML-based (CDXML) files and to retrieve the ChemDraw scheme from DOC, DOCX or XML documents. This can facilitate the reuse of chemical information embedded into diverse documents used as standard storage and communication instrument in chemical sciences (e.g. for student’s theses, PhD theses, or publications). The extracted information is processed to reactions, molecules, as well as additional text and values and can be accessed via the ChemScanner UI. ChemScanner supports the export to Excel and CML, the direct import of the extracted data to the Open Source ELN Chemotion or the use via “copy and paste” of selected information. The software was designed with a focus on the processing of documents with embedded molecular structure information as CDX or CDXML as these are the most common file formats for chemical drawings. The project aims to support the chemists in their efforts to re-use chemistry research data by providing them missing tools for an automated assembly of reaction data

    Chemotion ELN : an Open Source electronic lab notebook for chemists in academia

    Get PDF
    The development of an electronic lab notebook (ELN) for researchers working in the field of chemical sciences is presented. The web based application is available as an Open Source software that offers modern solutions for chemical researchers. The Chemotion ELN is equipped with the basic functionalities necessary for the acquisition and processing of chemical data, in particular the work with molecular structures and calculations based on molecular properties. The ELN supports planning, description, storage, and management for the routine work of organic chemists. It also provides tools for communicating and sharing the recorded research data among colleagues. Meeting the requirements of a state of the art research infrastructure, the ELN allows the search for molecules and reactions not only within the user’s data but also in conventional external sources as provided by SciFinder and PubChem. The presented development makes allowance for the growing dependency of scientific activity on the availability of digital information by providing Open Source instruments to record and reuse research data. The current version of the ELN has been using for over half of a year in our chemistry research group, serves as a common infrastructure for chemistry research and enables chemistry researchers to build their own databases of digital information as a prerequisite for the detailed, systematic investigation and evaluation of chemical reactions and mechanisms

    Predicting drug–drug interactions through drug structural similarities and interaction networks incorporating pharmacokinetics and pharmacodynamics knowledge

    Get PDF
    Additional file 1. Table S1. Average structural similarity scores for the DDI/non–DDI pairs in the network of each De. Table S2-1. Top 10 predicted drugs with DDIs for warfarin. Table S2-2. Top 10 predicted drugs with DDIs for simvastatin. Table S3. Four-fold cross-validation test results. Text S1. Drugs that show DDI (DrugBank ID). Figure S1. Illustration of construction of training and test set for 4-fold cross validation. Figure S2. ROC curves using the models with score set 1 in a 4-fold validation

    The octet rule in chemical space: Generating virtual molecules

    Full text link
    We present a generator of virtual molecules that selects valid chemistry on the basis of the octet rule. Also, we introduce a mesomer group key that allows a fast detection of duplicates in the generated structures. Compared to existing approaches, our model is simpler and faster, generates new chemistry and avoids invalid chemistry. Its versatility is illustrated by the correct generation of molecules containing third-row elements and a surprisingly adept handling of complex boron chemistry. Without any empirical parameters, our model is designed to be valid also in unexplored regions of chemical space. One first unexpected finding is the high prevalence of dipolar structures among generated molecules.Comment: 24 pages, 10 figure

    A Reaction Database for Small Molecule Pharmaceutical Processes Integrated with Process Information

    Get PDF
    This article describes the development of a reaction database with the objective to collect data for multiphase reactions involved in small molecule pharmaceutical processes with a search engine to retrieve necessary data in investigations of reaction-separation schemes, such as the role of organic solvents in reaction performance improvement. The focus of this reaction database is to provide a data rich environment with process information available to assist during the early stage synthesis of pharmaceutical products. The database is structured in terms of reaction classification of reaction types; compounds participating in the reaction; use of organic solvents and their function; information for single step and multistep reactions; target products; reaction conditions and reaction data. Information for reactor scale-up together with information for the separation and other relevant information for each reaction and reference are also available in the database. Additionally, the retrieved information obtained from the database can be evaluated in terms of sustainability using well-known “green” metrics published in the scientific literature. The application of the database is illustrated through the synthesis of ibuprofen, for which data on different reaction pathways have been retrieved from the database and compared using “green” chemistry metrics

    kGCN: a graph-based deep learning framework for chemical structures

    Get PDF
    Deep learning is developing as an important technology to perform various tasks in cheminformatics. In particular, graph convolutional neural networks (GCNs) have been reported to perform well in many types of prediction tasks related to molecules. Although GCN exhibits considerable potential in various applications, appropriate utilization of this resource for obtaining reasonable and reliable prediction results requires thorough understanding of GCN and programming. To leverage the power of GCN to benefit various users from chemists to cheminformaticians, an open-source GCN tool, kGCN, is introduced. To support the users with various levels of programming skills, kGCN includes three interfaces: a graphical user interface (GUI) employing KNIME for users with limited programming skills such as chemists, as well as command-line and Python library interfaces for users with advanced programming skills such as cheminformaticians. To support the three steps required for building a prediction model, i.e., pre-processing, model tuning, and interpretation of results, kGCN includes functions of typical pre-processing, Bayesian optimization for automatic model tuning, and visualization of the atomic contribution to prediction for interpretation of results. kGCN supports three types of approaches, single-task, multi-task, and multi-modal predictions. The prediction of compound-protein interaction for four matrixmetalloproteases, MMP-3, -9, -12 and -13, in the inhibition assays is performed as a representative case study using kGCN. Additionally, kGCN provides the visualization of atomic contributions to the prediction. Such visualization is useful for the validation of the prediction models and the design of molecules based on the prediction model, realizing “explainable AI” for understanding the factors affecting AI prediction. kGCN is available at https://github.com/clinfo
    • 

    corecore