99 research outputs found

    Data access and integration in the ISPIDER proteomics grid

    Get PDF
    Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources

    Enabling quantitative data analysis through e-infrastructures

    Get PDF
    This paper discusses how quantitative data analysis in the social sciences can engage with and exploit an e-Infrastructure. We highlight how a number of activities which are central to quantitative data analysis, referred to as ‘data management’, can benefit from e-infrastructure support. We conclude by discussing how these issues are relevant to the DAMES (Data Management through e-Social Science) research Node, an ongoing project that aims to develop e-Infrastructural resources for quantitative data analysis in the social sciences

    Role-Based Access Control for the Open Grid Services Architecture - Data Access and Integration (OGSA-DAI)

    Get PDF
    Grid has emerged recently as an integration infrastructure for the sharing and coordinated use of diverse resources in dynamic, distributed virtual organizations (VOs). A Data Grid is an architecture for the access, exchange, and sharing of data in the Grid environment. In this dissertation, role-based access control (RBAC) systems for heterogeneous data resources in Data Grid systems are proposed. The Open Grid Services Architecture - Data Access and Integration (OGSA-DAI) is a widely used framework for the integration of heterogeneous data resources in Grid systems. However, in the OGSA-DAI system, access control causes substantial administration overhead for resource providers in VOs because each of them has to manage the authorization information for individual Grid users. Its identity-based access control mechanisms are severely inefficient and too complicated to manage because the direct mapping between users and privileges is transitory. To solve this problem, (1) the Community Authorization Service (CAS), provided by the Globus toolkit, and (2) the Shibboleth, an attribute authorization service, are used to support RBAC in the OGSA-DAI system. The Globus Toolkit is widely used software for building Grid systems. Access control policies need to be specified and managed across multiple VOs. For this purpose, the Core and Hierarchical RBAC profile of the eXtensible Access Control Markup Language (XACML) is used; and for distributed administration of those policies, the Object, Metadata and Artifacts Registry (OMAR) is used. OMAR is based on the e-business eXtensible Markup Language (ebXML) registry specifications developed to achieve interoperable registries and repositories. The RBAC systems allow quick and easy deployments, privacy protection, and the centralized and distributed management of privileges. They support scalable, interoperable and fine-grain access control services; dynamic delegation of rights; and user-role assignments. They also reduce the administration overheads for resource providers because they need to maintain only the mapping information from VO roles to local database roles. Resource providers maintain the ultimate authority over their resources. Moreover, unnecessary mapping and connections can be avoided by denying invalid requests at the VO level. Performance analysis shows that our RBAC systems add only a small overhead to the existing security infrastructure of OGSA-DAI

    Proactive software rejuvenation solution for web enviroments on virtualized platforms

    Get PDF
    The availability of the Information Technologies for everything, from everywhere, at all times is a growing requirement. We use information Technologies from common and social tasks to critical tasks like managing nuclear power plants or even the International Space Station (ISS). However, the availability of IT infrastructures is still a huge challenge nowadays. In a quick look around news, we can find reports of corporate outage, affecting millions of users and impacting on the revenue and image of the companies. It is well known that, currently, computer system outages are more often due to software faults, than hardware faults. Several studies have reported that one of the causes of unplanned software outages is the software aging phenomenon. This term refers to the accumulation of errors, usually causing resource contention, during long running application executions, like web applications, which normally cause applications/systems to hang or crash. Gradual performance degradation could also accompany software aging phenomena. The software aging phenomena are often related to memory bloating/ leaks, unterminated threads, data corruption, unreleased file-locks or overruns. We can find several examples of software aging in the industry. The work presented in this thesis aims to offer a proactive and predictive software rejuvenation solution for Internet Services against software aging caused by resource exhaustion. To this end, we first present a threshold based proactive rejuvenation to avoid the consequences of software aging. This first approach has some limitations, but the most important of them it is the need to know a priori the resource or resources involved in the crash and the critical condition values. Moreover, we need some expertise to fix the threshold value to trigger the rejuvenation action. Due to these limitations, we have evaluated the use of Machine Learning to overcome the weaknesses of our first approach to obtain a proactive and predictive solution. Finally, the current and increasing tendency to use virtualization technologies to improve the resource utilization has made traditional data centers turn into virtualized data centers or platforms. We have used a Mathematical Programming approach to virtual machine allocation and migration to optimize the resources, accepting as many services as possible on the platform while at the same time, guaranteeing the availability (via our software rejuvenation proposal) of the services deployed against the software aging phenomena. The thesis is supported by an exhaustive experimental evaluation that proves the effectiveness and feasibility of our proposals for current systems

    Supporting security-oriented, inter-disciplinary research: crossing the social, clinical and geospatial domains

    Get PDF
    How many people have had a chronic disease for longer than 5-years in Scotland? How has this impacted upon their choices of employment? Are there any geographical clusters in Scotland where a high-incidence of patients with such long-term illness can be found? How does the life expectancy of such individuals compare with the national averages? Such questions are important to understand the health of nations and the best ways in which health care should be delivered and measured for their impact and success. In tackling such research questions, e-Infrastructures need to provide tailored, secure access to an extensible range of distributed resources including primary and secondary e-Health clinical data; social science data, and geospatial data sets amongst numerous others. In this paper we describe the security models underlying these e-Infrastructures and demonstrate their implementation in supporting secure, federated access to a variety of distributed and heterogeneous data sets exploiting the results of a variety of projects at the National e-Science Centre (NeSC) at the University of Glasgow

    Efficient replication of large volumes of data and maintaining data consistency by using P2P techniques in Desktop Grid

    Get PDF
    Desktop Grid is increasing in popularity because of relatively very low cost and good performance in institutions. Data-intensive applications require data management in scientific experiments conducted by researchers and scientists in Desktop Grid-based Distributed Computing Infrastructure (DCI). Some of these data-intensive applications deal with large volumes of data. Several solutions for data-intensive applications have been proposed for Desktop Grid (DG) but they are not efficient in handling large volumes of data. Data management in this environment deals with data access and integration, maintaining basic properties of databases, architecture for querying data, etc. Data in data-intensive applications has to be replicated in multiple nodes for improving data availability and reducing response time. Peer-to-Peer (P2P) is a well established technique for handling large volumes of data and is widely used on the internet. Its environment is similar to the environment of DG. The performance of existing P2P-based solution dealing with generic architecture for replicating large volumes of data is not efficient in DG-based DCI. Therefore, there is a need for a generic architecture for replicating large volumes of data efficiently by using P2P in BOINC based Desktop Grid. Present solutions for data-intensive applications mainly deal with read only data. New type of applications are emerging which deal large volumes of data and Read/Write of data. In emerging scientific experiments, some nodes of DG generate new snapshot of scientific data after regular intervals. This new snapshot of data is generated by updating some of the values of existing data fields. This updated data has to be synchronised in all DG nodes for maintaining data consistency. The performance of data management in DG can be improved by addressing efficient data replication and consistency. Therefore, there is need for algorithms which deal with data Read/Write consistency along with replication for large volumes of data in BOINC based Desktop Grid. The research is to identify efficient solutions for data replication in handling large volumes of data and maintaining Read/Write data consistency using Peer-to-Peer techniques in BOINC based Desktop Grid. This thesis presents the solutions that have been carried out to complete the research

    Dynamic Integration of Evolving Distributed Databases using Services

    Get PDF
    This thesis investigates the integration of many separate existing heterogeneous and distributed databases which, due to organizational changes, must be merged and appear as one database. A solution to some database evolution problems is presented. It presents an Evolution Adaptive Service-Oriented Data Integration Architecture (EA-SODIA) to dynamically integrate heterogeneous and distributed source databases, aiming to minimize the cost of the maintenance caused by database evolution. An algorithm, named Relational Schema Mapping by Views (RSMV), is designed to integrate source databases that are exposed as services into a pre-designed global schema that is in a data integrator service. Instead of producing hard-coded programs, views are built using relational algebra operations to eliminate the heterogeneities among the source databases. More importantly, the definitions of those views are represented and stored in the meta-database with some constraints to test their validity. Consequently, the method, called Evolution Detection, is then able to identify in the meta-database the views affected by evolutions and then modify them automatically. An evaluation is presented using case study. Firstly, it is shown that most types of heterogeneity defined in this thesis can be eliminated by RSMV, except semantic conflict. Secondly, it presents that few manual modification on the system is required as long as the evolutions follow the rules. For only three types of database evolutions, human intervention is required and some existing views are discarded. Thirdly, the computational cost of the automatic modification shows a slow linear growth in the number of source database. Other characteristics addressed include EA-SODIA’ scalability, domain independence, autonomy of source databases, and potential of involving other data sources (e.g.XML). Finally, the descriptive comparison with other data integration approaches is presented. It shows that although other approaches may provide better performance of query processing in some circumstances, the service-oriented architecture provide better autonomy, flexibility and capability of evolution

    Bioinformatics Solutions for Image Data Processing

    Get PDF
    In recent years, the increasing use of medical devices has led to the generation of large amounts of data, including image data. Bioinformatics solutions provide an effective approach for image data processing in order to retrieve information of interest and to integrate several data sources for knowledge extraction; furthermore, images processing techniques support scientists and physicians in diagnosis and therapies. In addition, bioinformatics image analysis may be extended to support several scenarios, for instance, in cyber-security the biometric recognition systems are applied to unlock devices and restricted areas, as well as to access sensitive data. In medicine, computational platforms generate high amount of data from medical devices such as Computed Tomography (CT), and Magnetic Resonance Imaging (MRI); this chapter will survey on bioinformatics solutions and toolkits for medical imaging in order to suggest an overview of techniques and methods that can be applied for the imaging analysis in medicine

    Making distributed computing infrastructures interoperable and accessible for e-scientists at the level of computational workflows

    Get PDF
    As distributed computing infrastructures evolve, and as their take up by user communities is growing, the importance of making different types of infrastructures based on a heterogeneous set of middleware interoperable is becoming crucial. This PhD submission, based on twenty scientific publications, presents a unique solution to the challenge of the seamless interoperation of distributed computing infrastructures at the level of workflows. The submission investigates workflow level interoperation inside a particular workflow system (intra-workflow interoperation), and also between different workflow solutions (inter-workflow interoperation). In both cases the interoperation of workflow component execution and the feeding of data into these components workflow components are considered. The invented and developed framework enables the execution of legacy applications and grid jobs and services on multiple grid systems, the feeding of data from heterogeneous file and data storage solutions to these workflow components, and the embedding of non-native workflows to a hosting meta-workflow. Moreover, the solution provides a high level user interface that enables e-scientist end-users to conveniently access the interoperable grid solutions without requiring them to study or understand the technical details of the underlying infrastructure. The candidate has also developed an application porting methodology that enables the systematic porting of applications to interoperable and interconnected grid infrastructures, and facilitates the exploitation of the above technical framework
    • …
    corecore