Search CORE

23 research outputs found

Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

Author: Castrillo Juan I
Goble Carole A
Kell Douglas B
Li Peter
Oinn Tom
Oliver Stephen G
Owen Stuart
Pocock Matthew R
Soiland-Reyes Stian
Velarde Giles
Wassink Ingo
Withers David
Publication venue: BMC Bioinformatics
Publication date: 01/01/2008
Field of study

BACKGROUND: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. RESULTS: Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. CONCLUSION: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Apollo (Cambridge)

University of Twente Research Information

Newcastle University E-Prints

FigShare

BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature

Author: C Jonquet
Diana de la Iglesia
F Belleau
G De la Calle
Guillermo de la Calle
ID Dinov
J Saltz
K Wolstencroft
M García-Remesal
M Gerstein
M Krallinger
M Musen
MD Wilkinson
MF Porter
Miguel García-Remesal
N Cannata
P Lord
PA Babu
RD Stevens
sD Tufi
Stefano Chiesa
Victor Maojo
WA Woods
Publication venue: BioMed Central
Publication date: 01/10/2009
Field of study

Abstract Background The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities. Results We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted. Conclusion These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at <url>http://edelman.dia.fi.upm.es/biri/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivo Digital UPM

Communication Web Services and JAIN-SLEE Integration Challenges

Author: Falcarin P.
Falcarin P.
Venezia Claudio
Venezia Claudio
Publication venue
Publication date: 01/01/2008
Field of study

Meshing up telecommunication and IT resources seems to be the real challenge for supporting the evolution towards the next generation of Web Services. In telecom world, JAIN-SLEE (JAIN Service Logic Execution Environment) is an emerging standard specification for Java service platforms targeted to host value added services, composed of telecom and IT services. In this paper we describe StarSLEE platform which extends JAIN-SLEE in order to compose JAIN-SLEE services with Web services and the StarSCE service creation environment which allows exporting value added services as communication web services, and we analyze open issues that must be addressed to introduce Web Services in new telecom service platforms

UEL Research Repository at University of East London

An Introduction to Programming for Bioscientists: A Python-based Primer

Author: Ekmekci Berk
McAnany Charles E.
Mura Cameron
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/05/2016
Field of study

Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

Open Source Workflow Engine for Cheminformatics: From Data Curation to Data Analysis

Author: Kuhn Thomas
Publication venue
Publication date: 01/01/2009
Field of study

The recent release of large open access chemistry databases into the public domain generates a demand for flexible tools to process them so as to discover new knowledge. To support Open Drug Discovery and Open Notebook Science on top of these data resources, is it desirable for the processing tools to be Open Source and available to everyone. The aim of this project was the development of an Open Source workflow engine to solve crucial cheminformatics problems. As a consequence, the CDK-Taverna project developed in the course of this thesis builds a cheminformatics workflow solution through the combination of different Open Source projects such as Taverna (workflow engine), the Chemistry Development Kit (CDK, cheminformatics library) and Pgchem::Tigress (chemistry database cartridge). The work on this project includes the implementation of over 160 different workers, which focus on cheminformatics tasks. The application of the developed methods to real world problems was the final objective of the project. The validation of Open Source software libraries and of chemical data derived from different databases is mandatory to all cheminformatics workflows. Methods to detect the atom types of chemical structures were used to validate the atom typing of the Chemistry Development Kit and to identify curation problems while processing different public databases, including the EBI drug databases ChEBI and ChEMBL as well as the natural products Chapman & Hall Chemical Database. The CDK atom typing shows a lack on atom types of heavier atoms but fits the need of databases containing organic substances including natural products. To support combinatorial chemistry an implementation of a reaction enumeration workflow was realized. It is based on generic reactions with lists of reactants and allows the generation of chemical libraries up to O(1000) molecules. Supervised machine learning techniques (perceptron-type artificial neural networks and support vector machines) were used as a proof of concept for quantitative modelling of adhesive polymer kinetics with the Mathematica GNWI.CIP package. This opens the perspective of an integration of high-level "experimental mathematics" into the CDK-Taverna based scientific pipelining. A chemical diversity analysis based on two different public and one proprietary databases including over 200,000 molecules was a large-scale application of the methods developed. For the chemical diversity analysis different molecular properties are calculated using the Chemistry Development Kit. The analysis of these properties was performed with Adaptive-Resonance-Theory (ART 2-A algorithm) for an automatic unsupervised classification of open categorical problems. The result shows a similar coverage of the chemical space of the two databases containing natural products (one public, one proprietary) whereas the ChEBI database covers a distinctly different chemical space. As a consequence these comparisons reveal interesting white-spots in the proprietary database. The combination of these results with pharmacological annotations of the molecules leads to further research and modelling activities

Kölner UniversitätsPublikationsServer