1,641 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Efficient replication of large volumes of data and maintaining data consistency by using P2P techniques in Desktop Grid

    Get PDF
    Desktop Grid is increasing in popularity because of relatively very low cost and good performance in institutions. Data-intensive applications require data management in scientific experiments conducted by researchers and scientists in Desktop Grid-based Distributed Computing Infrastructure (DCI). Some of these data-intensive applications deal with large volumes of data. Several solutions for data-intensive applications have been proposed for Desktop Grid (DG) but they are not efficient in handling large volumes of data. Data management in this environment deals with data access and integration, maintaining basic properties of databases, architecture for querying data, etc. Data in data-intensive applications has to be replicated in multiple nodes for improving data availability and reducing response time. Peer-to-Peer (P2P) is a well established technique for handling large volumes of data and is widely used on the internet. Its environment is similar to the environment of DG. The performance of existing P2P-based solution dealing with generic architecture for replicating large volumes of data is not efficient in DG-based DCI. Therefore, there is a need for a generic architecture for replicating large volumes of data efficiently by using P2P in BOINC based Desktop Grid. Present solutions for data-intensive applications mainly deal with read only data. New type of applications are emerging which deal large volumes of data and Read/Write of data. In emerging scientific experiments, some nodes of DG generate new snapshot of scientific data after regular intervals. This new snapshot of data is generated by updating some of the values of existing data fields. This updated data has to be synchronised in all DG nodes for maintaining data consistency. The performance of data management in DG can be improved by addressing efficient data replication and consistency. Therefore, there is need for algorithms which deal with data Read/Write consistency along with replication for large volumes of data in BOINC based Desktop Grid. The research is to identify efficient solutions for data replication in handling large volumes of data and maintaining Read/Write data consistency using Peer-to-Peer techniques in BOINC based Desktop Grid. This thesis presents the solutions that have been carried out to complete the research

    A Demand Based Load Balanced Service Replication Model

    Get PDF
    Cloud computing allows service users and providers to access the applications, logical resources and files on any computer with ease. A cloud service has three distinct characteristics that differentiate it from traditional hosting. It is sold on demand, typically by the minute or the hour; it is elastic. It is a way to increase capacity or add capabilities on the fly without investing in new infrastructure, training new personnel, or licensing new software. It not only promises reliable services delivered through next-generation data centers that are built on compute and storage virtualization technologies but also addresses the key issues such as scalability, reliability, fault tolerance and file load balancing. The one way to achieve this is through service replication across different machines coupled with load balancing. Though replication potentially improves fault tolerance, it leads to the problem of ensuring consistency of replicas when certain service is updated or modified. However, fewer replicas also decrease concurrency and the level of service availability. A balanced synchronization between replication mechanism and consistency not only ensures highly reliable and fault tolerant system but also improves system performance significantly. This paper presents a load balancing based service replication model that creates a replica on other servers on the basis of number of service accesses. The simulation results indicate that the proposed model reduces the number of messages exchanged for service replication by 25-55% thus improving the overall system performance significantly. Also in case of CPU load based file replication, it is observed that file access time reduces by 5.56%-7.65%
    • …
    corecore