4,484 research outputs found

    An enhanced dynamic replica creation and eviction mechanism in data grid federation environment

    Get PDF
    Data Grid Federation system is an infrastructure that connects several grid systems, which facilitates sharing of large amount of data, as well as storage and computing resources. The existing mechanisms on data replication focus on finding file values based on the number of files access in deciding which file to replicate, and place new replicas on locations that provide minimum read cost. DRCEM finds file values based on logical dependencies in deciding which file to replicate, and allocates new replicas on locations that provide minimum replica placement cost. This thesis presents an enhanced data replication strategy known as Dynamic Replica Creation and Eviction Mechanism (DRCEM) that utilizes the usage of data grid resources, by allocating appropriate replica sites around the federation. The proposed mechanism uses three schemes: 1) Dynamic Replica Evaluation and Creation Scheme, 2) Replica Placement Scheme, and 3) Dynamic Replica Eviction Scheme. DRCEM was evaluated using OptorSim network simulator based on four performance metrics: 1) Jobs Completion Times, 2) Effective Network Usage, 3) Storage Element Usage, and 4) Computing Element Usage. DRCEM outperforms ELALW and DRCM mechanisms by 30% and 26%, in terms of Jobs Completion Times. In addition, DRCEM consumes less storage compared to ELALW and DRCM by 42% and 40%. However, DRCEM shows lower performance compared to existing mechanisms regarding Computing Element Usage, due to additional computations of files logical dependencies. Results revealed better jobs completion times with lower resource consumption than existing approaches. This research produces three replication schemes embodied in one mechanism that enhances the performance of Data Grid Federation environment. This has contributed to the enhancement of the existing mechanism, which is capable of deciding to either create or evict more than one file during a particular time. Furthermore, files logical dependencies were integrated into the replica creation scheme to evaluate data files more accurately

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Analyse the Performance of Mobile Peer to Peer Network using Ant Colony Optimization

    Get PDF
    A mobile peer-to-peer computer network is the one in which each computer in the network can act as a client or server for the other computers in the network. The communication process among the nodes in the mobile peer to peer network requires more no of messages. Due to this large number of messages passing, propose an interconnection structure called distributed Spanning Tree (DST) and it improves the efficiency of the mobile peer to peer network. The proposed method improves the data availability and consistency across the entire network and also reduces the data latency and the required number of message passes for any specific application in the network. Further to enhance the effectiveness of the proposed system, the DST network is optimized with the Ant Colony Optimization method. It gives the optimal solution of the DST method and increased availability, enhanced consistency and scalability of the network. The simulation results shows that reduces the number of message sent for any specific application and average delay and increases the packet delivery ratio in the network

    Maintaining Replica Consistency Over Large-Scale Data Grid Using Update Propagation Technique

    Get PDF
    A Data Grid is an organized collection of nodes in a wide area network which contributes to various computation, storage data, and application. In Data Grid high numbers of users are distributed in a wide area environment which is dynamic and heterogeneous. Data management is one of the current issues where data transparency, consistency, fault-tolerance, automatic management and the performance are the user parameters in grid environment. Data management techniques must scale up while addressing autonomy, dynamicity and heterogeneity of the data resource. Data replication is a well known technique used to reduce accesses latency, improve availability and performance in a distributed computing environment. Replication introduces the problem of maintaining consistency among the replicas when files are allowed to be updated. The update information should be propagated to all replicas to guarantee correct read of the remote replicas. An asynchronous replication is a commonly agreed solution for the problem in consistency of replicas. A few studies have been done to maintain replica consistency in Data Grid. However, the introduced techniques are neither efficient nor scalable. They cannot be used in real Data Grid since the issues of large number of replica sites, large scale distribution, load balancing and site autonomy where the capability of grid site to join and leave the grid community at any time have not been addressed. This thesis proposes a new asynchronous replication protocol called Update Propagation Grid (UPG) to maintain replica consistency over a large scale data grid. In UPG the updates reach all on-line secondary replicas using a propagation technique based on nodes organized into a logical structure network in the form of two-dimensional grid structure. The proposed update propagation technique is a hybrid push-pull and dynamic technique that addresses the issues of site autonomy, efficiency, scalability, load balancing and fairness. A two performance analysis studies have been conducted to study the performance of the proposed technique in comparison with other techniques. First study involves mathematical and simulation analysis. Second study is based on Queuing Network Model. The result of the performance analysis shows that the proposed technique scales well with high number of replica sites and with high request loads. The result also shows the reduction on the average update reach time by 5% to 97%. Moreover the result shows that the proposed technique is capable of reaching load balancing while providing update propagation fairnes

    Review and Comparison of Intelligent Optimization Modelling Techniques for Energy Forecasting and Condition-Based Maintenance in PV Plants

    Get PDF
    Within the field of soft computing, intelligent optimization modelling techniques include various major techniques in artificial intelligence. These techniques pretend to generate new business knowledge transforming sets of "raw data" into business value. One of the principal applications of these techniques is related to the design of predictive analytics for the improvement of advanced CBM (condition-based maintenance) strategies and energy production forecasting. These advanced techniques can be used to transform control system data, operational data and maintenance event data to failure diagnostic and prognostic knowledge and, ultimately, to derive expected energy generation. One of the systems where these techniques can be applied with massive potential impact are the legacy monitoring systems existing in solar PV energy generation plants. These systems produce a great amount of data over time, while at the same time they demand an important e ort in order to increase their performance through the use of more accurate predictive analytics to reduce production losses having a direct impact on ROI. How to choose the most suitable techniques to apply is one of the problems to address. This paper presents a review and a comparative analysis of six intelligent optimization modelling techniques, which have been applied on a PV plant case study, using the energy production forecast as the decision variable. The methodology proposed not only pretends to elicit the most accurate solution but also validates the results, in comparison with the di erent outputs for the di erent techniques

    Academic Cloud Computing Research: Five Pitfalls and Five Opportunities

    Get PDF
    This discussion paper argues that there are five fundamental pitfalls, which can restrict academics from conducting cloud computing research at the infrastructure level, which is currently where the vast majority of academic research lies. Instead academics should be conducting higher risk research, in order to gain understanding and open up entirely new areas. We call for a renewed mindset and argue that academic research should focus less upon physical infrastructure and embrace the abstractions provided by clouds through five opportunities: user driven research, new programming models, PaaS environments, and improved tools to support elasticity and large-scale debugging. The objective of this paper is to foster discussion, and to define a roadmap forward, which will allow academia to make longer-term impacts to the cloud computing community.Comment: Accepted and presented at the 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud'14

    Data Replication and Its Alignment with Fault Management in the Cloud Environment

    Get PDF
    Nowadays, the exponential data growth becomes one of the major challenges all over the world. It may cause a series of negative impacts such as network overloading, high system complexity, and inadequate data security, etc. Cloud computing is developed to construct a novel paradigm to alleviate massive data processing challenges with its on-demand services and distributed architecture. Data replication has been proposed to strategically distribute the data access load to multiple cloud data centres by creating multiple data copies at multiple cloud data centres. A replica-applied cloud environment not only achieves a decrease in response time, an increase in data availability, and more balanced resource load but also protects the cloud environment against the upcoming faults. The reactive fault tolerance strategy is also required to handle the faults when the faults already occurred. As a result, the data replication strategies should be aligned with the reactive fault tolerance strategies to achieve a complete management chain in the cloud environment. In this thesis, a data replication and fault management framework is proposed to establish a decentralised overarching management to the cloud environment. Three data replication strategies are firstly proposed based on this framework. A replica creation strategy is proposed to reduce the total cost by jointly considering the data dependency and the access frequency in the replica creation decision making process. Besides, a cloud map oriented and cost efficiency driven replica creation strategy is proposed to achieve the optimal cost reduction per replica in the cloud environment. The local data relationship and the remote data relationship are further analysed by creating two novel data dependency types, Within-DataCentre Data Dependency and Between-DataCentre Data Dependency, according to the data location. Furthermore, a network performance based replica selection strategy is proposed to avoid potential network overloading problems and to increase the number of concurrent-running instances at the same time
    corecore