12 research outputs found

    New developments on the cheminformatics open workflow environment CDK-Taverna

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The computational processing and analysis of small molecules is at heart of cheminformatics and structural bioinformatics and their application in e.g. metabolomics or drug discovery. Pipelining or workflow tools allow for the Lego™-like, graphical assembly of I/O modules and algorithms into a complex workflow which can be easily deployed, modified and tested without the hassle of implementing it into a monolithic application. The CDK-Taverna project aims at building a free open-source cheminformatics pipelining solution through combination of different open-source projects such as Taverna, the Chemistry Development Kit (CDK) or the Waikato Environment for Knowledge Analysis (WEKA). A first integrated version 1.0 of CDK-Taverna was recently released to the public.</p> <p>Results</p> <p>The CDK-Taverna project was migrated to the most up-to-date versions of its foundational software libraries with a complete re-engineering of its worker's architecture (version 2.0). 64-bit computing and multi-core usage by paralleled threads are now supported to allow for fast in-memory processing and analysis of large sets of molecules. Earlier deficiencies like workarounds for iterative data reading are removed. The combinatorial chemistry related reaction enumeration features are considerably enhanced. Additional functionality for calculating a natural product likeness score for small molecules is implemented to identify possible drug candidates. Finally the data analysis capabilities are extended with new workers that provide access to the open-source WEKA library for clustering and machine learning as well as training and test set partitioning. The new features are outlined with usage scenarios.</p> <p>Conclusions</p> <p>CDK-Taverna 2.0 as an open-source cheminformatics workflow solution matured to become a freely available and increasingly powerful tool for the biosciences. The combination of the new CDK-Taverna worker family with the already available workflows developed by a lively Taverna community and published on myexperiment.org enables molecular scientists to quickly calculate, process and analyse molecular data as typically found in e.g. today's systems biology scenarios.</p

    The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

    Get PDF
    open access articleBackground: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software

    A retrosynthetic biology approach to metabolic pathway design for therapeutic production

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Synthetic biology is used to develop cell factories for production of chemicals by constructively importing heterologous pathways into industrial microorganisms. In this work we present a retrosynthetic approach to the production of therapeutics with the goal of developing an <it>in situ </it>drug delivery device in host cells. Retrosynthesis, a concept originally proposed for synthetic chemistry, iteratively applies reversed chemical transformations (reversed enzyme-catalyzed reactions in the metabolic space) starting from a target product to reach precursors that are endogenous to the chassis. So far, a wider adoption of retrosynthesis into the manufacturing pipeline has been hindered by the complexity of enumerating all feasible biosynthetic pathways for a given compound.</p> <p>Results</p> <p>In our method, we efficiently address the complexity problem by coding substrates, products and reactions into molecular signatures. Metabolic maps are represented using hypergraphs and the complexity is controlled by varying the specificity of the molecular signature. Furthermore, our method enables candidate pathways to be ranked to determine which ones are best to engineer. The proposed ranking function can integrate data from different sources such as host compatibility for inserted genes, the estimation of steady-state fluxes from the genome-wide reconstruction of the organism's metabolism, or the estimation of metabolite toxicity from experimental assays. We use several machine-learning tools in order to estimate enzyme activity and reaction efficiency at each step of the identified pathways. Examples of production in bacteria and yeast for two antibiotics and for one antitumor agent, as well as for several essential metabolites are outlined.</p> <p>Conclusions</p> <p>We present here a unified framework that integrates diverse techniques involved in the design of heterologous biosynthetic pathways through a retrosynthetic approach in the reaction signature space. Our engineering methodology enables the flexible design of industrial microorganisms for the efficient on-demand production of chemical compounds with therapeutic applications.</p

    Simulación molecular del efecto de la agregación de asfaltenos en la viscosidad de sistemas asfalteno-solvente

    Get PDF
    En este trabajo se determina el efecto de la concentración y el tipo de estructura de los asfaltenos en la agregación y la viscosidad de sistemas asfalteno/solvente, mediante el uso de técnicas de simulación molecular. El estudio se realiza para representaciones formadas por una mezcla de cuatro moléculas de asfalteno y sistemas compuestos exclusivamente por asfaltenos tipo archipiélago y tipo isla, usando concentraciones en peso de asfaltenos del 15 y 30 % en tolueno y n-heptano. Inicialmente se determina el valor de la viscosidad de mezclas de n-heptano y tolueno usando tres técnicas de simulación molecular, con el objetivo de validar la herramienta de simulación y la forma de cálculo de la viscosidad. Entre estas técnicas de simulación se encuentra que: el método de G-K produce los mejores resultados y presenta el mejor balance entre la calidad de los resultados y el tiempo de simulación requerido. La técnica de NEMD genera resultados aceptables para la viscosidad del sistema, pero requiere mayor tiempo de simulación y la determinación de valores apropiados de la velocidad de cizallamiento. Por su lado, la técnica de rNEMD es rápida, elimina la necesidad de determinar un grupo de valores para la velocidad de cizallamiento pero presenta grandes fluctuaciones en la medida de la velocidad de cizallamiento y de la viscosidad. En el estudio del efecto de la estructura molecular de los asfaltenos en el tamaño promedio de agregación, se obtiene que las moléculas de asfalteno usadas presentan una fuerte consistencia con la definición de solubilidad de los asfaltenos en tolueno y la insolubilidad en solventes como el n-heptano. Por otro lado el número de moléculas por agregado en tolueno y n-heptano, se encuentra entre dos y cuatro moléculas. Se determina que las moléculas tipo archipiélago presentan poca agregación en tolueno y n-heptano. Finalmente se encuentra que en n-heptano la agregación aumenta con la concentración de los asfaltenos en la solución, mientras que en tolueno la agregación permanece en valores pequeños sin importar la concentración o la molécula de asfalteno que se use. Por último se determina la viscosidad de las diferentes representaciones de asfalteno en n-heptano y tolueno. Se encuentra que el valor de la viscosidad no se ve directamente afectado por la forma estructural de la representación del asfalteno, sin embargo sí se ve afectado por la concentración y el tipo de solvente. Finalmente, se obtiene que el valor de la viscosidad no presenta un cambio considerable con el tamaño promedio de agregación, es decir que la agregación varía en el tiempo pero la herramienta de cálculo no es lo suficientemente sensible para percibir el cambio en el valor de la viscosidad por efecto de la variación en el tamaño de agregaciónMaestrí

    Engineering Cellular Transport Systems to Enhance Lignocellulose Bioconversion

    Get PDF
    abstract: Lignocellulosic biomass represents a renewable domestic feedstock that can support large-scale biochemical production processes for fuels and specialty chemicals. However, cost-effective conversion of lignocellulosic sugars into valuable chemicals by microorganisms still remains a challenge. Biomass recalcitrance to saccharification, microbial substrate utilization, bioproduct titer toxicity, and toxic chemicals associated with chemical pretreatments are at the center of the bottlenecks limiting further commercialization of lignocellulose conversion. Genetic and metabolic engineering has allowed researchers to manipulate microorganisms to overcome some of these challenges, but new innovative approaches are needed to make the process more commercially viable. Transport proteins represent an underexplored target in genetic engineering that can potentially help to control the input of lignocellulosic substrate and output of products/toxins in microbial biocatalysts. In this work, I characterize and explore the use of transport systems to increase substrate utilization, conserve energy, increase tolerance, and enhance biocatalyst performance.Dissertation/ThesisDoctoral Dissertation Biological Design 201

    Automated de novo metabolite identification with mass spectrometry and cheminformatics

    Get PDF
    In this thesis new algorithms and methods that enable the de novo identification of metabolites have been developed. The aim was to find methods to propose candidate structures for unknown metabolites using MSn data as starting point. These methods have been integrated into a semi-automated pipeline to identify new human metabolites. The discovery of new metabolites will improve our capability to understand disease via its metabolic fingerprint, to develop personalized treatments and to discover new drugs. In addition, the cheminformatics methods presented in this thesis increase our understanding on the properties of human metabolites. The research described in this thesis has shown that the success of de novo metabolite identification relies on the synergy between analytical chemistry methods (i.e. LC-MSn) and cheminformatics tools.Netherlands Organization for Applied Scientific Research (TNO) Netherlands Metabolomics CentreUBL - phd migration 201

    Kinetic model construction using chemoinformatics

    Get PDF
    Kinetic models of chemical processes not only provide an alternative to costly experiments; they also have the potential to accelerate the pace of innovation in developing new chemical processes or in improving existing ones. Kinetic models are most powerful when they reflect the underlying chemistry by incorporating elementary pathways between individual molecules. The downside of this high level of detail is that the complexity and size of the models also steadily increase, such that the models eventually become too difficult to be manually constructed. Instead, computers are programmed to automate the construction of these models, and make use of graph theory to translate chemical entities such as molecules and reactions into computer-understandable representations. This work studies the use of automated methods to construct kinetic models. More particularly, the need to account for the three-dimensional arrangement of atoms in molecules and reactions of kinetic models is investigated and illustrated by two case studies. First of all, the thermal rearrangement of two monoterpenoids, cis- and trans-2-pinanol, is studied. A kinetic model that accounts for the differences in reactivity and selectivity of both pinanol diastereomers is proposed. Secondly, a kinetic model for the pyrolysis of the fuel “JP-10” is constructed and highlights the use of state-of-the-art techniques for the automated estimation of thermochemistry of polycyclic molecules. A new code is developed for the automated construction of kinetic models and takes advantage of the advances made in the field of chemo-informatics to tackle fundamental issues of previous approaches. Novel algorithms are developed for three important aspects of automated construction of kinetic models: the estimation of symmetry of molecules and reactions, the incorporation of stereochemistry in kinetic models, and the estimation of thermochemical and kinetic data using scalable structure-property methods. Finally, the application of the code is illustrated by the automated construction of a kinetic model for alkylsulfide pyrolysis

    Development and implementation of in silico molecule fragmentation algorithms for the cheminformatics analysis of natural product spaces

    Get PDF
    Computational methodologies extracting specific substructures like functional groups or molecular scaffolds from input molecules can be grouped under the term “in silico molecule fragmentation”. They can be used to investigate what specifically characterises a heterogeneous compound class, like pharmaceuticals or Natural Products (NP) and in which aspects they are similar or dissimilar. The aim is to determine what specifically characterises NP structures to transfer patterns favourable for bioactivity to drug development. As part of this thesis, the first algorithmic approach to in silico deglycosylation, the removal of glycosidic moieties for the study of aglycones, was developed with the Sugar Removal Utility (SRU) (Publication A). The SRU has also proven useful for investigating NP glycoside space. It was applied to one of the largest open NP databases, COCONUT (COlleCtion of Open Natural prodUcTs), for this purpose (Publication B). A contribution was made to the Chemistry Development Kit (CDK) by developing the open Scaffold Generator Java library (Publication C). Scaffold Generator can extract different scaffold types and dissect them into smaller parent scaffolds following the scaffold tree or scaffold network approach. Publication D describes the OngLai algorithm, the first automated method to identify homologous series in input datasets, group the member structures of each group, and extract their common core. To support the development of new fragmentation algorithms, the open Java rich client graphical user interface application MORTAR (MOlecule fRagmenTAtion fRamework) was developed as part of this thesis (Publication E). MORTAR allows users to quickly execute the steps of importing a structural dataset, applying a fragmentation algorithm, and visually inspecting the results in different ways. All software developed as part of this thesis is freely and openly available (see https://github.com/JonasSchaub)