Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources

C.A. Goble

E.M. Zdobnov

J. Smith

L.M. Haas

M. Antonioletti

M. Maibaum

P. Buneman

P. Mçbrien

R.G.G. Cattell

S. Bowers

S. Durinck

S.B. Davidson

T.M. Oinn

English

Zamboulis, Lucas

Fan, Hao

Belhajjame, Khalid

Siepen, Jennifer

Jones, Andrew

Martin, Nigel

Poulovassilis, Alexandra

Hubbard, Simon

Embury, Suzanne

Paton, Norman

Name not available

Data access and integration in the ISPIDER proteomics grid

Birkbeck Institutional Research Online

A database for storing, searching and disseminating experimental proteomics data.

A.Poulovassilis. De¯ning peer-to-peer data integration using both as view rules.

Atlas { a data warehouse for integrative bioinformatics.

BioKleisli: A Digital Library for Biomedical Researchers.

Biomart and bioconductor: a powerful link between biological databases and microarray data analysis.

Cluster based integration of heterogeneous biological databases using the AutoMed toolkit.

Comprehension syntax.

Data integration by bi-directional schema transformation rules.

Discoverylink: A system for integrated access to life sciences data sources.

Distributed query processing on the Grid.

Open source system for analyzing, validating, and storing protein identi¯cation data.

Pepseeker: a database of proteome peptide identi¯cations for investigating fragmentation patterns.

Probability-based protein identi¯cation by searching sequence databases using mass spectrometry data. Electrophoresis,

Processing IQL queries and migrating data in the AutoMed toolkit.

Service-based distributed querying on the Grid.

Taverna: a tool for the composition and enactment of bioinformatics work°ows.

The design and implementation of grid database services in OGSA-DAI. Concurrency - Practice and Experience,

The EBI SRS server-recent developments.

The integr8 project - a resource for genomic and proteomic data.

The Object Database Standard:

Transparent access to multiple bioinformatics information sources.

Abstract. Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources. 

Lucas Zamboulis

Hao Fan

Khalid Belhajjame

Jennifer Siepen

Andrew Jones

Nigel Martin

Ra Poulovassilis

Simon Hubbard

Suzanne M. Embury

Norman W. Paton

CiteSeerX

Abstract. Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources. 1 Introduction Grid computing technologies are becoming established which enable distributedcomputational and data resources to be accessed in a service-based environment. In the life sciences, these technologies offer the possibility of analysis of complexdistributed post-genomic resources. To support transparent access, however, such heterogeneous resources need to be integrated rather than simply accessed in adistributed fashion. This paper presents an architecture for such integration and discusses the application of this architecture for the integration of severalautonomous proteomics resources

Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources. © Springer-Verlag Berlin Heidelberg 2006

Embury, Suzanne M.

Paton, Norman W.

The University of Manchester - Institutional Repository

Data access and integration in the ISPIDER proteomics Grid

Crossref

Data Access and Integration in the ISPIDER Proteomics Grid

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.96.8354

Data access and integration in the ISPIDER proteomics grid

Abstract

Similar works

Full text

Available Versions

Name not available

Birkbeck Institutional Research Online

CiteSeerX

CiteSeerX

The University of Manchester - Institutional Repository

Crossref