6 research outputs found
Data access and integration in the ISPIDER proteomics grid
Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources
Bioinformatics service reconciliation by heterogeneous schema transformation
This paper focuses on the problem of bioinformatics service reconciliation in a generic and scalable manner so as to enhance interoperability in a highly evolving field. Using XML as a common representation format, but also supporting existing flat-file representation formats, we propose an approach for the scalable semi-automatic reconciliation of services, possibly invoked from within a scientific workflows tool. Service reconciliation may use the AutoMed heterogeneous data integration system as an intermediary service, or may use AutoMed to produce services that mediate between services. We discuss the application of our approach for the reconciliation of services in an example bioinformatics workflow. The main contribution of this research is an architecture for the scalable reconciliation of bioinformatics services
Recommended from our members
Grid-based semantic integration of heterogeneous data resources: Implementation on a HealthGrid
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.The semantic integration of geographically distributed and heterogeneous data
resources still remains a key challenge in Grid infrastructures. Today's
mainstream Grid technologies hold the promise to meet this challenge in a
systematic manner, making data applications more scalable and manageable. The
thesis conducts a thorough investigation of the problem, the state of the art, and
the related technologies, and proposes an Architecture for Semantic Integration of
Data Sources (ASIDS) addressing the semantic heterogeneity issue. It defines a
simple mechanism for the interoperability of heterogeneous data sources in order
to extract or discover information regardless of their different semantics. The
constituent technologies of this architecture include Globus Toolkit (GT4) and
OGSA-DAI (Open Grid Service Architecture Data Integration and Access)
alongside other web services technologies such as XML (Extensive Markup
Language). To show this, the ASIDS architecture was implemented and tested in a
realistic setting by building an exemplar application prototype on a HealthGrid
(pilot implementation).
The study followed an empirical research methodology and was informed by
extensive literature surveys and a critical analysis of the relevant technologies and
their synergies. The two literature reviews, together with the analysis of the
technology background, have provided a good overview of the current Grid and
HealthGrid landscape, produced some valuable taxonomies, explored new paths
by integrating technologies, and more importantly illuminated the problem and
guided the research process towards a promising solution. Yet the primary
contribution of this research is an approach that uses contemporary Grid
technologies for integrating heterogeneous data resources that have semantically
different. data fields (attributes). It has been practically demonstrated (using a
prototype HealthGrid) that discovery in semantically integrated distributed data
sources can be feasible by using mainstream Grid technologies, which have been
shown to have some Significant advantages over non-Grid based approaches