401 research outputs found

    Data replication and update propagation in XML P2P data management systems

    Get PDF
    XML P2P data management systems are P2P systems that use XML as the underlying data format shared between peers in the network. These systems aim to bring the benefits of XML and P2P systems to the distributed data management field. However, P2P systems are known for their lack of central control and high degree of autonomy. Peers may leave the network at any time at will, increasing the risk of data loss. Despite this, most research in XML P2P systems focus on novel and efficient XML indexing and retrieval techniques. Mechanisms for ensuring data availability in XML P2P systems has received comparatively little attention. This project attempts to address this issue. We design an XML P2P data management framework to improve data availability. This framework includes mechanisms for wide-spread data replication, replica location and update propagation. It allows XML documents to be broken down into fragments. By doing so, we aim to reduce the cost of replicating data by distributing smaller XML fragments throughout the network rather than entire documents. To tackle the data replication problem, we propose a suite of selection and placement algorithms that may be interchanged to form a particular replication strategy. To support the placement of replicas anywhere in the network, we use a Fragment Location Catalogue, a global index that maintains the locations of replicas. We also propose a lazy update propagation algorithm to propagate updates to replicas. Experiments show that the data replication algorithms improve data availability in our experimental network environment. We also find that breaking XML documents into smaller pieces and replicating those instead of whole XML documents considerably reduces the replication cost, but at the price of some loss in data availability. For the update propagation tests, we find that the probability that queries return up-to-date results increases, but improvements to the algorithm are necessary to handle environments with high update rates

    Transactions and data management in NoSQL cloud databases

    Get PDF
    NoSQL databases have become the preferred option for storing and processing data in cloud computing as they are capable of providing high data availability, scalability and efficiency. But in order to achieve these attributes, NoSQL databases make certain trade-offs. First, NoSQL databases cannot guarantee strong consistency of data. They only guarantee a weaker consistency which is based on eventual consistency model. Second, NoSQL databases adopt a simple data model which makes it easy for data to be scaled across multiple nodes. Third, NoSQL databases do not support table joins and referential integrity which by implication, means they cannot implement complex queries. The combination of these factors implies that NoSQL databases cannot support transactions. Motivated by these crucial issues this thesis investigates into the transactions and data management in NoSQL databases. It presents a novel approach that implements transactional support for NoSQL databases in order to ensure stronger data consistency and provide appropriate level of performance. The novelty lies in the design of a Multi-Key transaction model that guarantees the standard properties of transactions in order to ensure stronger consistency and integrity of data. The model is implemented in a novel loosely-coupled architecture that separates the implementation of transactional logic from the underlying data thus ensuring transparency and abstraction in cloud and NoSQL databases. The proposed approach is validated through the development of a prototype system using real MongoDB system. An extended version of the standard Yahoo! Cloud Services Benchmark (YCSB) has been used in order to test and evaluate the proposed approach. Various experiments have been conducted and sets of results have been generated. The results show that the proposed approach meets the research objectives. It maintains stronger consistency of cloud data as well as appropriate level of reliability and performance

    Data management in cloud environments: NoSQL and NewSQL data stores

    Get PDF
    : Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Key-CRDT stores

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe Internet has opened opportunities to create world scale services. These systems require highavailability and fault tolerance, while preserving low latency. Replication is a widely adopted technique to provide these properties. Different replication techniques have been proposed through the years, but to support these properties for world scale services it is necessary to trade consistency for availability, fault-tolerance and low latency. In weak consistency models, it is necessary to deal with possible conflicts arising from concurrent updates. We propose the use of conflict free replicated data types (CRDTs) to address this issue. Cloud computing systems support world scale services, often relying on Key-Value stores for storing data. These systems partition and replicate data over multiple nodes, that can be geographically disperse over the network. For handling conflict, these systems either rely on solutions that lose updates (e.g. last-write-wins) or require application to handle concurrent updates. Additionally, these systems provide little support for transactions, a widely used abstraction for data access. In this dissertation, we present the design and implementation of SwiftCloud, a Key-CRDT store that extends a Key-Value store by incorporating CRDTs in the system’s data-model. The system provides automatic conflict resolution relying on properties of CRDTs. We also present a version of SwiftCloud that supports transactions. Unlike traditional transactional systems, transactions never abort due to write/write conflicts, as the system leverages CRDT properties to merge concurrent transactions. For implementing SwiftCloud, we have introduced a set of new techniques, including versioned CRDTs, composition of CRDTs and alternative serialization methods. The evaluation of the system, with both micro-benchmarks and the TPC-W benchmark, shows that SwiftCloud imposes little overhead over a key-value store. Allowing clients to access a datacenter close to them with SwiftCloud, can reduce latency without requiring any complex reconciliation mechanism. The experience of using SwiftCloud has shown that adapting an existing application to use SwiftCloud requires low effort.Project PTDC/EIA-EIA/108963/200

    Self-management for large-scale distributed systems

    Get PDF
    Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control

    SEARCH, REPLICATION AND GROUPING FOR UNSTRUCTURED P2P NETWORKS

    Get PDF
    In my dissertation, I present a suite of protocols that assist in efficient content location and distribution in unstructured Peer-to-Peer overlays. The basis of these schemes is their ability to learn from past interactions, increasing their performance with time. Peer-to-Peer (P2P) networks are gaining increasing attention from both the scientific and the large Internet user community. Popular applications utilizing this new technology offer many attractive features to a growing number of users. P2P systems have two basic functions: Content search and dissemination. Search (or lookup) protocols define how participants locate remotely maintained resources. In data dissemination, users transmit or receive content from single or multiple sites in the network. P2P applications traditionally operate under purely decentralized and highly dynamic environments. Unstructured systems represent a particularly interesting class of P2P networks. Peers form an overlay in an ad-hoc manner, without any guarantees relative to lookup performance or content availability. Resources are locally maintained, while participants have limited knowledge, usually confined to their immediate neighborhood in the overlay. My work aims at providing effective and bandwidth-efficient searching and data sharing. A suite of algorithms which provide peers in unstructured P2P overlays with the state necessary in order to efficiently locate, disseminate and replicate objects is presented. The Adaptive Probabilistic Search (APS) scheme utilizes directed walkers to forward queries on a hop-by-hop basis. Peers store success probabilities for each of their neighbors in order to efficiently route towards object holders. AGNO performs implicit grouping of peers according to the demand incentive and utilizes state maintained by APS in order to route messages from content holders towards interested peers, without requiring any subscription process. Finally, the Adaptive Probabilistic REplication (APRE) scheme expands on the state that AGNO builds in order to replicate content inside query intensive areas according to demand

    Decentralized Knowledge Graphs on the Web

    Get PDF
    corecore