Search CORE

9 research outputs found

Semantic Web integration of Cheminformatics resources with the SADI framework

Author: A McNaught
B Chen
BP Vandervalk
C Steinbeck
CA Lipinski
CA Lipinski
DDG Gessler
E Benfenati
F Belleau
J Kietz
Leonid L Chepelev
M DiBernardo
MD Wilkinson
MD Wilkinson
Michel Dumontier
P Lord
PB Neerincx
R Guha
R Stevens
T Kuhn
T Vitvar
Publication venue: BioMed Central
Publication date: 01/05/2011
Field of study

Abstract Background The diversity and the largely independent nature of chemical research efforts over the past half century are, most likely, the major contributors to the current poor state of chemical computational resource and database interoperability. While open software for chemical format interconversion and database entry cross-linking have partially addressed database interoperability, computational resource integration is hindered by the great diversity of software interfaces, languages, access methods, and platforms, among others. This has, in turn, translated into limited reproducibility of computational experiments and the need for application-specific computational workflow construction and semi-automated enactment by human experts, especially where emerging interdisciplinary fields, such as systems chemistry, are pursued. Fortunately, the advent of the Semantic Web, and the very recent introduction of RESTful Semantic Web Services (SWS) may present an opportunity to integrate all of the existing computational and database resources in chemistry into a machine-understandable, unified system that draws on the entirety of the Semantic Web. Results We have created a prototype framework of Semantic Automated Discovery and Integration (SADI) framework SWS that exposes the QSAR descriptor functionality of the Chemistry Development Kit. Since each of these services has formal ontology-defined input and output classes, and each service consumes and produces RDF graphs, clients can automatically reason about the services and available reference information necessary to complete a given overall computational task specified through a simple SPARQL query. We demonstrate this capability by carrying out QSAR analysis backed by a simple formal ontology to determine whether a given molecule is drug-like. Further, we discuss parameter-based control over the execution of SADI SWS. Finally, we demonstrate the value of computational resource envelopment as SADI services through service reuse and ease of integration of computational functionality into formal ontologies. Conclusions The work we present here may trigger a major paradigm shift in the distribution of computational resources in chemistry. We conclude that envelopment of chemical computational resources as SADI SWS facilitates interdisciplinary research by enabling the definition of computational problems in terms of ontologies and formal logical statements instead of cumbersome and application-specific tasks and workflows.</p

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

Author: A Gil
A Luckow
A Tiwari
B Ludäscher
BC Pierce
BP Vandervalk
C Lin
Cameron Mura
CS Soto
D Earl
D Frishman
E Bartocci
E Deelman
E Deelman
J Dean
J Eker
J Misra
J Orvis
JP Morrison
K Hinsen
K Jeffay
M Halling-Brown
Marcin Cieślik
MC Schatz
MR Berthold
MWEJ Fiers
P Liu
P Romano
P Romano
S Hoon
S Kannan
T Oinn
T Tu
U Radetzki
W Van der Aalst
WM Johnston
Z Yao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (<it>e.g</it>., for biomolecular sequences, alignments, structures) and functionality (<it>e.g</it>., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at <url>http://muralab.org/PaPy</url>, and includes extensive documentation and annotated usage examples.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient

Author: A Prlić
A Riek
A Stoltzfus
A Vilella
AA Popescu
Aaron Steele
Arlin Stoltzfus
B Boyle
B Smith
BD Shenoy
BP Vandervalk
Brian O'Meara
Brian Sidlauskas
CA Stewart
Campbell O Webb
Christian M Zmasek
CM Zmasek
CO Webb
CS Parr
D Maddison
D McDonald
DJ Patterson
DR Maddison
Emily Jane McTavish
Enrico Pontelli
EW Sayers
F Prosdocimi
FA Matsen
Foundation FS
G Klyne
Gaurav Vaidya
Greg Jordan
H Martinson
Helena Deus
Hilmar Lapp
Holly M Bik
J Cannone
J Dean
J Felsenstein
J Felsenstein
J Goecks
J Leebens-Mack
J Ruan
J Sukumaran
James P Balhoff
Jeet Sukumaran
Joseph W Brown
JP Doyon
Karen Cranston
Luke J Harmon
M Han
M Heymans
M Pagel
M Pagel
M Sanderson
MA Miller
MA O'Leary
Mark Westneat
Matthew W Pennell
Megan Pirrung
Michael E Alfaro
Michael S Rosenberg
MJ Sanderson
MM Smolenaars
Naim Matasci
OR Bininda-Emonds
PA Goloboff
Peter E Midford
PO Larsen
PO Lewis
R Stelkens
RA Vos
RD Page
RD Page
RS Voss
Rutger Vos
S Kumar
S Kummerfeld
S Urbanek
S Urbanek
SA Smith
SB Hedges
Siavash Mirarab
SM Farris
T Berners-Lee
T Hughes-Croucher
The Angiosperm Phylogeny G
Tracy A Heath
W Piel
World Wide Web Consortium
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data.

Author: Birol I
Bohlmann J
Bousquet J
Boyle B
Brand D
Coope R
Jackman SD
Jaquish B
Jones SJ
Keeling CI
Kirk H
Mackay J
Moore RA
Mungall AJ
Pandoh P
Pleasance S
Raymond A
Ritland C
Ritland K
Taylor GA
Vandervalk BP
Yanchuk A
Yuen MM
Zhao Y
Publication venue
Publication date: 01/01/2013
Field of study

UNLABELLED: White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and providing genomics resources for this commercially valuable tree will help improve forest management and conservation efforts. Sequencing and assembling the large and highly repetitive spruce genome though pushes the boundaries of the current technology. Here, we describe a whole-genome shotgun sequencing strategy using two Illumina sequencing platforms and an assembly approach using the ABySS software. We report a 20.8 giga base pairs draft genome in 4.9 million scaffolds, with a scaffold N50 of 20,356 bp. We demonstrate how recent improvements in the sequencing technology, especially increasing read lengths and paired end reads from longer fragments have a major impact on the assembly contiguity. We also note that scalable bioinformatics tools are instrumental in providing rapid draft assemblies. AVAILABILITY: The Picea glauca genome sequencing and assembly data are available through NCBI (Accession#: ALWZ0100000000 PID: PRJNA83435). http://www.ncbi.nlm.nih.gov/bioproject/83435

Oxford University Research Archive