Search CORE

1,304 research outputs found

Data access and integration in the ISPIDER proteomics grid

Author: C.A. Goble
E.M. Zdobnov
J. Smith
L.M. Haas
M. Antonioletti
M. Maibaum
P. Buneman
P. Mçbrien
R.G.G. Cattell
S. Bowers
S. Durinck
S.B. Davidson
T.M. Oinn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources

CiteSeerX

Birkbeck Institutional Research Online

Distributed BLAST in a grid computing context

Author: D.R. Mathog
K. Hokamp
R. Clifford
R.C. Braun
S.F. Altschul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

The Basic Local Alignment Search Tool (BLAST) is one of the best known sequence comparison programs available in bioinformatics. It is used to compare query sequences to a set of target sequences, with the intention of finding similar sequences in the target set. Here, we present a distributed BLAST service which operates over a set of heterogeneous Grid resources and is made available through a Globus toolkit v.3 Grid service. This work has been carried out in the context of the BRIDGES project, a UK e-Science project aimed at providing a Grid based environment for biomedical research. Input consisting of multiple query sequences is partitioned into sub-jobs on the basis of the number of idle compute nodes available and then processed on these in batches. To achieve this, we have implemented our own Java-based scheduler which distributes sub-jobs across an array of resources utilizing a variety of local job scheduling systems

CiteSeerX

Enlighten

Agents in Bioinformatics

Author: Luck M
Merelli E
Publication venue
Publication date: 01/01/2005
Field of study

The scope of the Technical Forum Group (TFG) on Agents in Bioinformatics (BIOAGENTS) was to inspire collaboration between the agent and bioinformatics communities with the aim of creating an opportunity to propose a different (agent-based) approach to the development of computational frameworks both for data analysis in bioinformatics and for system modelling in computational biology. During the day, the participants examined the future of research on agents in bioinformatics primarily through 12 invited talks selected to cover the most relevant topics. From the discussions, it became clear that there are many perspectives to the field, ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages for use by information agents, and to the use of Grid agents, each of which requires further exploration. The interactions between participants encouraged the development of applications that describe a way of creating agent-based simulation models of biological systems, starting from an hypothesis and inferring new knowledge (or relations) by mining and analysing the huge amount of public biological data. In this report we summarise and reflect on the presentations and discussions

Southampton (e-Prints Soton)

Archivio istituzionale della ricerca - Università di Camerino

King's Research Portal

High-throughput bioinformatics with the Cyrille2 pipeline system

Author: Datema Erwin
de Groot Joost CW
Fiers Mark WEJ
van der Burgt Ate
van Ham Roeland CHJ
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Modern omics research involves the application of high-throughput technologies that generate vast volumes of data. These data need to be pre-processed, analyzed and integrated with existing knowledge through the use of diverse sets of software tools, models and databases. The analyses are often interdependent and chained together to form complex workflows or <it>pipelines</it>. Given the volume of the data used and the multitude of computational resources available, specialized pipeline software is required to make high-throughput analysis of large-scale omics datasets feasible. Results We have developed a generic pipeline system called Cyrille2. The system is modular in design and consists of three functionally distinct parts: 1) a web based, graphical user interface (<it>GUI</it>) that enables a pipeline operator to manage the system; 2) the <it>Scheduler</it>, which forms the functional core of the system and which tracks what data enters the system and determines what jobs must be scheduled for execution, and; 3) the <it>Executor</it>, which searches for scheduled jobs and executes these on a compute cluster. Conclusion The Cyrille2 system is an extensible, modular system, implementing the stated requirements. Cyrille2 enables easy creation and execution of high throughput, flexible bioinformatics pipelines.</p

Springer - Publisher Connector

Directory of Open Access Journals

SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

Author: Lefkowitz Elliot J
Wang Chunlin
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. RESULTS: We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. CONCLUSIONS: Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist

Springer - Publisher Connector

Directory of Open Access Journals

From access and integration to mining of secure genomic data sets across the grid

Author: Sinnott R.O.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

The UK Department of Trade and Industry (DTI) funded BRIDGES project (Biomedical Research Informatics Delivered by Grid Enabled Services) has developed a Grid infrastructure to support cardiovascular research. This includes the provision of a compute Grid and a data Grid infrastructure with security at its heart. In this paper we focus on the BRIDGES data Grid. A primary aim of the BRIDGES data Grid is to help control the complexity in access to and integration of a myriad of genomic data sets through simple Grid based tools. We outline these tools, how they are delivered to the end user scientists. We also describe how these tools are to be extended in the BBSRC funded Grid Enabled Microarray Expression Profile Search (GEMEPS) to support a richer vocabulary of search capabilities to support mining of microarray data sets. As with BRIDGES, fine grain Grid security underpins GEMEPS

Enlighten

SNP locator: a candidate SNP selection tool

Author: Aguiar-Pulido Vanessa
Cabarcos Alba
Dorado Julián
Quintela Sonsoles
Rabuñal Juan R.
Seoane José A.
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2013
Field of study

[Abstract] In this work, a data integration approach using a federated model based on a service oriented architecture (SOA) is presented. The BioMOBY middleware was used to implement each service which is part of the integration process. As an example of usage of this architecture, a web tool for candidate SNP selection has been developed. Thus, several BioMOBY services have been created as the model layer of the web application. Each data source has a wrapper which communicates with the federated model, that is, the BioMOBY model, and this model is the one that interacts with the client.Red Gallega de Investigación sobre Cáncer Colorrectal; Ref. 2009/58Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; 209RT-0366Instituto de Salud Carlos III; PIO52048Instituto de Salud Carlos III; RD07/0067/0005Galicia. Consellería de Economía e Industria ; 10SIN105004PRMinisterio de Industria, Turismo y Comercio; TSI-020110-2009-5

University of Miami: Scholarship Miami

BioMAJ: a flexible framework for databanks synchronization and processing

Author: A. Assi
C. Caron
D. Allouche
Etzold
H. Leroy
J.-M. Larre
L. Legrand
O. Collin
O. Filangi
V. Martin
Y. Beausse
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Large- and medium-scale computational molecular biology projects require accurate bioinformatics software and numerous heterogeneous biological databanks, which are distributed around the world. BioMAJ provides a flexible, robust, fully automated environment for managing such massive amounts of data. The JAVA application enables automation of the data update cycle process and supervision of the locally mirrored data repository. We have developed workflows that handle some of the most commonly used bioinformatics databases. A set of scripts is also available for post-synchronization data treatment consisting of indexation or format conversion (for NCBI blast, SRS, EMBOSS, GCG, etc.). BioMAJ can be easily extended by personal homemade processing scripts. Source history can be kept via html reports containing statements of locally managed databanks

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

ProdInra

Hal-Diderot

HAL-Rennes 1