94,830 research outputs found

    Efficient replication of large volumes of data and maintaining data consistency by using P2P techniques in Desktop Grid

    Get PDF
    Desktop Grid is increasing in popularity because of relatively very low cost and good performance in institutions. Data-intensive applications require data management in scientific experiments conducted by researchers and scientists in Desktop Grid-based Distributed Computing Infrastructure (DCI). Some of these data-intensive applications deal with large volumes of data. Several solutions for data-intensive applications have been proposed for Desktop Grid (DG) but they are not efficient in handling large volumes of data. Data management in this environment deals with data access and integration, maintaining basic properties of databases, architecture for querying data, etc. Data in data-intensive applications has to be replicated in multiple nodes for improving data availability and reducing response time. Peer-to-Peer (P2P) is a well established technique for handling large volumes of data and is widely used on the internet. Its environment is similar to the environment of DG. The performance of existing P2P-based solution dealing with generic architecture for replicating large volumes of data is not efficient in DG-based DCI. Therefore, there is a need for a generic architecture for replicating large volumes of data efficiently by using P2P in BOINC based Desktop Grid. Present solutions for data-intensive applications mainly deal with read only data. New type of applications are emerging which deal large volumes of data and Read/Write of data. In emerging scientific experiments, some nodes of DG generate new snapshot of scientific data after regular intervals. This new snapshot of data is generated by updating some of the values of existing data fields. This updated data has to be synchronised in all DG nodes for maintaining data consistency. The performance of data management in DG can be improved by addressing efficient data replication and consistency. Therefore, there is need for algorithms which deal with data Read/Write consistency along with replication for large volumes of data in BOINC based Desktop Grid. The research is to identify efficient solutions for data replication in handling large volumes of data and maintaining Read/Write data consistency using Peer-to-Peer techniques in BOINC based Desktop Grid. This thesis presents the solutions that have been carried out to complete the research

    Data Replication-Based Scheduling in Cloud Computing Environment

    Get PDF
    Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes a bottleneck for the whole cloud workflow system and decreases the performance of the system dramatically. Job scheduling and data replication are two important techniques which can enhance the performance of data-intensive applications. It is wise to integrate these techniques into one framework for achieving a single objective. In this paper, we integrate data replication and job scheduling with the aim of reducing response time by reduction of data access time in cloud computing environment. This is called data replication-based scheduling (DRBS). Simulation results show the effectiveness of our algorithm in comparison with well-known algorithms such as random and round-robin

    Unifying data and replica placement for data-intensive services in geographically distributed clouds

    Get PDF
    The increased reliance of data management applications on cloud computing technologies has rendered research in identifying solutions to the data placement problem to be of paramount importance. The objective of the classical data placement problem is to optimally partition, while also allowing for replication, the set of data-items into distributed data centers to minimize the overall network communication cost. Despite significant advancement in data placement research, replica placement has seldom been studied in unison with data placement. More specifically, most of the existing solutions employ a two-phase approach: 1) data placement, followed by 2) replication. Replication should however be seen as an integral part of data placement, and should be studied as a joint optimization problem with the latter. In this paper, we propose a unified paradigm of data placement, called CPR, which combines data placement and replication of data-intensive services into geographically distributed clouds as a joint optimization problem. Underneath CPR, lies an overlapping correlation clustering algorithm capable of assigning a data-item to multiple data centers, thereby enabling us to jointly solve data placement and replication. Experiments on a real-world trace-based online social network dataset show that CPR is effective and scalable. Empirically, it is approximate to 35% better in efficacy on the evaluated metrics, while being up to 8 times faster in execution time when compared to state-of-the-art techniques

    Scalable, Data- intensive Network Computation

    Get PDF
    To enable groups of collaborating researchers at different locations to effectively share large datasets and investigate their spontaneous hypotheses on the fly, we are interested in de- veloping a distributed system that can be easily leveraged by a variety of data intensive applications. The system is composed of (i) a number of best effort logistical depots to en- able large-scale data sharing and in-network data processing, (ii) a set of end-to-end tools to effectively aggregate, manage and schedule a large number of network computations with attendant data movements, and (iii) a Distributed Hash Table (DHT) on top of the generic depot services for scalable data management. The logistical depot is extended by following the end-to-end principles and is modeled with a closed queuing network model. Its performance characteristics are studied by solving the steady state distributions of the model using local balance equations. The modeling results confirm that the wide area network is the performance bottleneck and running concurrent jobs can increase resource utilization and system throughput. As a novel contribution, techniques to effectively support resource demanding data- intensive applications using the ¯ne-grained depot services are developed. These techniques include instruction level scheduling of operations, dynamic co-scheduling of computation and replication, and adaptive workload control. Experiments in volume visualization have proved the effectiveness of these techniques. Due to the unique characteristic of data- intensive applications and our co-scheduling algorithm, a DHT is implemented on top of the basic storage and computation services. It demonstrates the potential of the Logistical Networking infrastructure to serve as a service creation platform

    Secure Transmission in Wireless Sensor Networks Data Using Linear Kolmogorov Watermarking Technique

    Full text link
    In Wireless sensor networks (WSNs), All communications between different nodes are sent out in a broadcast fashion. These networks are used in a variety of applications including military, environmental, and smart spaces. Sensors are susceptible to various types of attack, such as data modification, data insertion and deletion, or even physical capture and sensor replacement. Hence security becomes important issue in WSNs. However given the fact that sensors are resources constrained, hence the traditional intensive security algorithms are not well suited for WSNs. This makes traditional security techniques, based on data encryption, not very suitable for WSNs. This paper proposes Linear Kolmogorov watermarking technique for secure data communication in WSNs. We provide a security analysis to show the robustness of the proposed techniques against various types of attacks. This technique is robust against data deletion, packet replication and Sybil attack

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    ILP-based energy minimization techniques for banked memories

    Get PDF
    Main memories can consume a significant portion of overall energy in many data-intensive embedded applications. One way of reducing this energy consumption is banking, that is, dividing available memory space into multiple banks and placing unused (idle) memory banks into low-power operating modes. Prior work investigated code-restructuring- and data-layout-reorganization-based approaches for increasing the energy benefits that could be obtained from a banked memory architecture. This article explores different techniques that can potentially coexist within the same optimization framework for maximizing benefits of low-power operating modes. These techniques include employing nonuniform bank sizes, data migration, data compression, and data replication. By using these techniques, we try to increase the chances for utilizing low-power operating modes in a more effective manner, and achieve further energy savings over what could be achieved by exploiting low-power modes alone. Specifically, nonuniform banking tries to match bank sizes with application-data access patterns. The goal of data migration is to cluster data with similar access patterns in the same set of banks. Data compression reduces the size of the data used by an application, and thus helps reduce the number of memory banks occupied by data. Finally, data replication increases bank idleness by duplicating select read-only data blocks across banks. We formulate each of these techniques as an ILP (integer linear programming) problem, and solve them using a commercial solver. Our experimental analysis using several benchmarks indicates that all the techniques presented in this framework are successful in reducing memory energy consumption. Based on our experience with these techniques, we recommend to compiler writers for banked memories to consider data compression, replication, and migration. © 2008 ACM

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed
    corecore