Search CORE

36 research outputs found

New developments on the cheminformatics open workflow environment CDK-Taverna

Author: Jayaseelan Kalai Vanii
Neumann Stefan
Steinbeck Christoph
Truszkowski Andreas
Willighagen Egon L
Zielesny Achim
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The computational processing and analysis of small molecules is at heart of cheminformatics and structural bioinformatics and their application in e.g. metabolomics or drug discovery. Pipelining or workflow tools allow for the Lego™-like, graphical assembly of I/O modules and algorithms into a complex workflow which can be easily deployed, modified and tested without the hassle of implementing it into a monolithic application. The CDK-Taverna project aims at building a free open-source cheminformatics pipelining solution through combination of different open-source projects such as Taverna, the Chemistry Development Kit (CDK) or the Waikato Environment for Knowledge Analysis (WEKA). A first integrated version 1.0 of CDK-Taverna was recently released to the public. Results The CDK-Taverna project was migrated to the most up-to-date versions of its foundational software libraries with a complete re-engineering of its worker's architecture (version 2.0). 64-bit computing and multi-core usage by paralleled threads are now supported to allow for fast in-memory processing and analysis of large sets of molecules. Earlier deficiencies like workarounds for iterative data reading are removed. The combinatorial chemistry related reaction enumeration features are considerably enhanced. Additional functionality for calculating a natural product likeness score for small molecules is implemented to identify possible drug candidates. Finally the data analysis capabilities are extended with new workers that provide access to the open-source WEKA library for clustering and machine learning as well as training and test set partitioning. The new features are outlined with usage scenarios. Conclusions CDK-Taverna 2.0 as an open-source cheminformatics workflow solution matured to become a freely available and increasingly powerful tool for the biosciences. The combination of the new CDK-Taverna worker family with the already available workflows developed by a lively Taverna community and published on myexperiment.org enables molecular scientists to quickly calculate, process and analyse molecular data as typically found in e.g. today's systems biology scenarios.</p

Maastricht University Research Portal

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CDK-Taverna: an open workflow environment for cheminformatics

Author: Kuhn Thomas
Steinbeck Christoph
Willighagen Egon L
Zielesny Achim
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Small molecules are of increasing interest for bioinformatics in areas such as metabolomics and drug discovery. The recent release of large open access chemistry databases generates a demand for flexible tools to process them and discover new knowledge. To freely support open science based on these data resources, it is desirable for the processing tools to be open source and available for everyone. Results Here we describe a novel combination of the workflow engine Taverna and the cheminformatics library Chemistry Development Kit (CDK) resulting in a open source workflow solution for cheminformatics. We have implemented more than 160 different workers to handle specific cheminformatics tasks. We describe the applications of CDK-Taverna in various usage scenarios. Conclusions The combination of the workflow engine Taverna and the Chemistry Development Kit provides the first open source cheminformatics workflow solution for the biosciences. With the Taverna-community working towards a more powerful workflow engine and a more user-friendly user interface, CDK-Taverna has the potential to become a free alternative to existing proprietary workflow tools.</p

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Publikationer från Uppsala Universitet

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services

Author: A Labarga
AR Jones
B Wallner
BioMoby Consortium
C Steinbeck
C Steinbeck
D Smedley
E Jain
E Willighagen
Egon L Willighagen
EW Sayers
GL Holliday
H Stockinger
H Sugawara
Jarl ES Wikberg
Johannes Wagener
L Stein
LM Vaquero
M Hucka
M Lapins
MA Larkin
MD Wilkinson
MWEJ Fiers
N Adams
O Spjuth
Ola Spjuth
P Fisher
P Murray-Rust
PBT Neerincx
R Kottmann
RD Dowell
S Hoon
S Hunter
S Kaarthik
S Kerrien
S Kuhn
S Miyazaki
T Oinn
UniProt Consortium
X Dong
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use. Results: We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics. Conclusion: XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

PubMed Central

Open Access LMU

Recommended from our members

The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration

Author: A Fiannaca
A Fiannaca
A Fiannaca
A Truszkowski
A Ultsch
A Ultsch
Alfonso Urso
Antonino Fiannaca
C Borgelt
CA Goble
D Digles
G Di Fatta
Giuseppe Di Fatta
HE Pence
J Hastings
K Wolstencroft
M Hall
Massimo La Rosa
N Belacel
P Ertl
Riccardo Rizzo
S Jupp
S Riniker
Salvatore Gaglio
T Kohonen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets

Central Archive at the University of Reading

Crossref

Springer - Publisher Connector

PubMed Central

Archivio istituzionale della ricerca - Università di Palermo

Exploring Protein-Protein Interactions as Drug Targets for Anti-cancer Therapy with In Silico Workflows

Author: A Goncearenco
A Goncearenco
A Marchler-Bauer
A Truszkowski
AA Bogan
B Graves
B Ma
BA Shoemaker
BA Shoemaker
BA Shoemaker
BJ Smith
CA Goble
CM Yates
D Petrey
E Cukuroglu
FP Davis
H Perez-Sanchez
HS Haase
J Bhagat
J Cinatl
JA Wells
K Wolstencroft
M Guharoy
M Li
M Li
M Li
M Petukh
M Tyagi
MK Gilson
MP Mazanetz
N Estrada-Ortiz
P Aloy
P Aloy
P Filippakopoulos
R Mosca
RR Thangudu
S Beisken
S Kim
S Shangary
S Teng
T Rolland
W Yang
WS Valdar
Y Wang
Y Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We describe a computational protocol to aid the design of small molecule and peptide drugs that target protein-protein interactions, particularly for anti-cancer therapy. To achieve this goal, we explore multiple strategies, including finding binding hot spots, incorporating chemical similarity and bioactivity data, and sampling similar binding sites from homologous protein complexes. We demonstrate how to combine existing interdisciplinary resources with examples of semi-automated workflows. Finally, we discuss several major problems, including the occurrence of drug-resistant mutations, drug promiscuity, and the design of dual-effect inhibitors.Fil: Goncearenco, Alexander. National Institutes of Health; Estados UnidosFil: Li, Minghui. Soochow University; China. National Institutes of Health; Estados UnidosFil: Simonetti, Franco Lucio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Shoemaker, Benjamin A. National Institutes of Health; Estados UnidosFil: Panchenko, Anna R. National Institutes of Health; Estados Unido

Crossref

CONICET Digital

Open Source Workflow Engine for Cheminformatics: From Data Curation to Data Analysis

Author: Kuhn Thomas
Publication venue
Publication date: 01/01/2009
Field of study

The recent release of large open access chemistry databases into the public domain generates a demand for flexible tools to process them so as to discover new knowledge. To support Open Drug Discovery and Open Notebook Science on top of these data resources, is it desirable for the processing tools to be Open Source and available to everyone. The aim of this project was the development of an Open Source workflow engine to solve crucial cheminformatics problems. As a consequence, the CDK-Taverna project developed in the course of this thesis builds a cheminformatics workflow solution through the combination of different Open Source projects such as Taverna (workflow engine), the Chemistry Development Kit (CDK, cheminformatics library) and Pgchem::Tigress (chemistry database cartridge). The work on this project includes the implementation of over 160 different workers, which focus on cheminformatics tasks. The application of the developed methods to real world problems was the final objective of the project. The validation of Open Source software libraries and of chemical data derived from different databases is mandatory to all cheminformatics workflows. Methods to detect the atom types of chemical structures were used to validate the atom typing of the Chemistry Development Kit and to identify curation problems while processing different public databases, including the EBI drug databases ChEBI and ChEMBL as well as the natural products Chapman & Hall Chemical Database. The CDK atom typing shows a lack on atom types of heavier atoms but fits the need of databases containing organic substances including natural products. To support combinatorial chemistry an implementation of a reaction enumeration workflow was realized. It is based on generic reactions with lists of reactants and allows the generation of chemical libraries up to O(1000) molecules. Supervised machine learning techniques (perceptron-type artificial neural networks and support vector machines) were used as a proof of concept for quantitative modelling of adhesive polymer kinetics with the Mathematica GNWI.CIP package. This opens the perspective of an integration of high-level "experimental mathematics" into the CDK-Taverna based scientific pipelining. A chemical diversity analysis based on two different public and one proprietary databases including over 200,000 molecules was a large-scale application of the methods developed. For the chemical diversity analysis different molecular properties are calculated using the Chemistry Development Kit. The analysis of these properties was performed with Adaptive-Resonance-Theory (ART 2-A algorithm) for an automatic unsupervised classification of open categorical problems. The result shows a similar coverage of the chemical space of the two databases containing natural products (one public, one proprietary) whereas the ChEBI database covers a distinctly different chemical space. As a consequence these comparisons reveal interesting white-spots in the proprietary database. The combination of these results with pharmacological annotations of the molecules leads to further research and modelling activities

Kölner UniversitätsPublikationsServer

Applications of the InChI in cheminformatics with the CDK and Bioclipse.

Author: Adams Samuel
Berg Arvid
Spjuth Ola
Willighagen Egon L
Publication venue: J Cheminform
Publication date: 01/01/2013
Field of study

RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.BACKGROUND: The InChI algorithms are written in C++ and not available as Java library. Integration into software written in Java therefore requires a bridge between C and Java libraries, provided by the Java Native Interface (JNI) technology. RESULTS: We here describe how the InChI library is used in the Bioclipse workbench and the Chemistry Development Kit (CDK) cheminformatics library. To make this possible, a JNI bridge to the InChI library was developed, JNI-InChI, allowing Java software to access the InChI algorithms. By using this bridge, the CDK project packages the InChI binaries in a module and offers easy access from Java using the CDK API. The Bioclipse project packages and offers InChI as a dynamic OSGi bundle that can easily be used by any OSGi-compliant software, in addition to the regular Java Archive and Maven bundles. Bioclipse itself uses the InChI as a key component and calculates it on the fly when visualizing and editing chemical structures. We demonstrate the utility of InChI with various applications in CDK and Bioclipse, such as decision support for chemical liability assessment, tautomer generation, and for knowledge aggregation using a linked data approach. CONCLUSIONS: These results show that the InChI library can be used in a variety of Java library dependency solutions, making the functionality easily accessible by Java software, such as in the CDK. The applications show various ways the InChI has been used in Bioclipse, to enrich its functionality

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

Publikationer från Uppsala Universitet

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Apollo (Cambridge)

Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

Author: A Copestake
A Tiwari
Apache
B Florian
B Ludascher
B Mellebeek
B Muller
BalaKrishna Kolluru
C Kolarik
C Kolrik
C Nobata
C Steinbeck
CJ Rupp
CJ Rupp
D Banville
D Ferrucci
D Jiao
I Taylor
J Shon
J Wren
JA Townsend
Junichi Tsujii
K Hettne
K Hettne
Lezan Hawizy
M Hassan
N Kemp
P Corbett
P Corbett
P Murray-Rust
P Murray-Rust
Peter Murray-Rust
R Klinger
R Klinger
SG Vellay
Sophia Ananiadou
T Kuhn
T Kuhn
T Oinn
Tim J. Hubbard
WJ Wilbur
Y Kano
Y Kano
Y Kano
Y Kano
Y Miyao
Y Tsuruoka
Y Tsuruoka
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Scientific workflow systems: Pipeline Pilot and KNIME

Author: A Truszkowski
H Patel
J Shon
M Hassan
MA Kappler
T Kuhn
Wendy A. Warr
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study