7,742 research outputs found

    DKVF: A Framework for Rapid Prototyping and Evaluating Distributed Key-value Stores

    Full text link
    We present our framework DKVF that enables one to quickly prototype and evaluate new protocols for key-value stores and compare them with existing protocols based on selected benchmarks. Due to limitations of CAP theorem, new protocols must be developed that achieve the desired trade-off between consistency and availability for the given application at hand. Hence, both academic and industrial communities focus on developing new protocols that identify a different (and hopefully better in one or more aspect) point on this trade-off curve. While these protocols are often based on a simple intuition, evaluating them to ensure that they indeed provide increased availability, consistency, or performance is a tedious task. Our framework, DKVF, enables one to quickly prototype a new protocol as well as identify how it performs compared to existing protocols for pre-specified benchmarks. Our framework relies on YCSB (Yahoo! Cloud Servicing Benchmark) for benchmarking. We demonstrate DKVF by implementing four existing protocols --eventual consistency, COPS, GentleRain and CausalSpartan-- with it. We compare the performance of these protocols against different loading conditions. We find that the performance is similar to our implementation of these protocols from scratch. And, the comparison of these protocols is consistent with what has been reported in the literature. Moreover, implementation of these protocols was much more natural as we only needed to translate the pseudocode into Java (and add the necessary error handling). Hence, it was possible to achieve this in just 1-2 days per protocol. Finally, our framework is extensible. It is possible to replace individual components in the framework (e.g., the storage component)

    A policy language definition for provenance in pervasive computing

    Get PDF
    Recent advances in computing technology have led to the paradigm of pervasive computing, which provides a means of simplifying daily life by integrating information processing into the everyday physical world. Pervasive computing draws its power from knowing the surroundings and creates an environment which combines computing and communication capabilities. Sensors that provide high-resolution spatial and instant measurement are most commonly used for forecasting, monitoring and real-time environmental modelling. Sensor data generated by a sensor network depends on several influences, such as the configuration and location of the sensors or the processing performed on the raw measurements. Storing sufficient metadata that gives meaning to the recorded observation is important in order to draw accurate conclusions or to enhance the reliability of the result dataset that uses this automatically collected data. This kind of metadata is called provenance data, as the origin of the data and the process by which it arrived from its origin are recorded. Provenance is still an exploratory field in pervasive computing and many open research questions are yet to emerge. The context information and the different characteristics of the pervasive environment call for different approaches to a provenance support system. This work implements a policy language definition that specifies the collecting model for provenance management systems and addresses the challenges that arise with stream data and sensor environments. The structure graph of the proposed model is mapped to the Open Provenance Model in order to facilitating the sharing of provenance data and interoperability with other systems. As provenance security has been recognized as one of the most important components in any provenance system, an access control language has been developed that is tailored to support the special requirements of provenance: fine-grained polices, privacy policies and preferences. Experimental evaluation findings show a reasonable overhead for provenance collecting and a reasonable time for provenance query performance, while a numerical analysis was used to evaluate the storage overhead

    A Semantic Consistency Model to Reduce Coordination in Replicated Systems

    Get PDF
    Large-scale distributed applications need to be available and responsive to satisfy millions of users, which can be achieved by having data geo-replicated in multiple replicas. However, a partitioned system cannot sustain availability and consistency at fully. The usage of weak consistency models might lead to data integrity violations, triggered by problematic concurrent updates, such as selling twice the last ticket on a flight company service. To overcome possible conflicts, programmers might opt to apply strong consistency, which guarantees a total order between operations, while preserving data integrity. Nevertheless, the illusion of being a non-replicated system affects its availability. In contrast, weaker notions might be used, such as eventual consistency, that boosts responsiveness, as operations are executed directly at the source replica and their effects are propagated to remote replicas in the background. However, this approach might put data integrity at risk. Current protocols that preserve invariants rely on, at least, causal consistency, a consistency model that maintains causal dependencies between operations. In this dissertation, we propose a protocol that includes a semantic consistency model. This consistency model stands between eventual consistency and causal consistency. We guarantee better performance comparing with causal consistency, and ensure data integrity. Through semantic analysis, relying on the static analysis tool CISE3, we manage to limit the maximum number of dependencies that each operation will have. To support the protocol, we developed a communication algorithm in a cluster. Additionally, we present an architecture that uses Akka, an actor-based middleware in which actors communicate by exchanging messages. This architecture adopts the publish/subscribe pattern and includes data persistence. We also consider the stability of operations, as well as a dynamic cluster environment, ensuring the convergence of the replicated state. Finally, we perform an experimental evaluation regarding the performance of the algorithm using standard case studies. The evaluation confirms that by relying on semantic analysis, the system requires less coordination between the replicas than causal consistency, ensuring data integrity.AplicaçÔes distribuĂ­das em larga escala necessitam de estar disponĂ­veis e de serem responsivas para satisfazer milhĂ”es de utilizadores, o que pode ser alcançado atravĂ©s da geo-replicação dos dados em mĂșltiplas rĂ©plicas. No entanto, um sistema particionado nĂŁo consegue garantir disponibilidade e consistĂȘncia na sua totalidade. O uso de modelos de consistĂȘncia fraca pode levar a violaçÔes da integridade dos dados, originadas por escritas concorrentes problemĂĄticas. Para superar possĂ­veis conflitos, os programadores podem optar por aplicar modelos de consistĂȘncia forte, originando uma ordem total das operaçÔes, assegurando a integridade dos dados. Em contrapartida, podem ser utilizadas noçÔes mais fracas, como a consistĂȘncia eventual, que aumenta a capacidade de resposta, uma vez que as operaçÔes sĂŁo executadas diretamente na rĂ©plica de origem e os seus efeitos sĂŁo propagados para rĂ©plicas remotas. No entanto, esta abordagem pode colocar em risco a integridade dos dados. Os protocolos existentes que preservam as invariantes dependem, pelo menos, da consistĂȘncia causal, um modelo de consistĂȘncia que mantĂ©m as dependĂȘncias causais entre operaçÔes. Nesta dissertação propomos um protocolo que inclui um modelo de consistĂȘncia semĂąntica. Este modelo situa-se entre a consistĂȘncia eventual e a consistĂȘncia causal. Garantimos um melhor desempenho em comparação com a consistĂȘncia causal, e asseguramos a integridade dos dados. AtravĂ©s de uma anĂĄlise semĂąntica, obtida atravĂ©s da ferramenta de anĂĄlise estĂĄtica CISE3, conseguimos limitar o nĂșmero de dependĂȘncias de cada operação. Para suportar o protocolo, desenvolvemos um algoritmo de comunicação entre um aglomerado de rĂ©plicas. Adicionalmente, apresentamos uma arquitetura que utiliza Akka, um middleware baseado em atores que trocam mensagens entre si. Esta arquitetura utiliza o padrĂŁo publish/subscribe e inclui a persistĂȘncia dos dados. Consideramos tambĂ©m a estabilidade das operaçÔes, bem como um ambiente dinĂąmico de rĂ©plicas, assegurando a convergĂȘncia do estado. Por Ășltimo, apresentamos a avaliação do desempenho do algoritmo desenvolvido, que confirma que a anĂĄlise semĂąntica das operaçÔes requer menos coordenação entre as rĂ©plicas que a consistĂȘncia causal

    Activity Report 2012. Project-Team RMOD. Analyses and Languages Constructs for Object-Oriented Application Evolution

    Get PDF
    Activity Report 2012 Project-Team RMOD Analyses and Languages Constructs for Object-Oriented Application Evolutio

    Big continuous data: dealing with velocity by composing event streams

    No full text
    International audienceThe rate at which we produce data is growing steadily, thus creating even larger streams of continuously evolving data. Online news, micro-blogs, search queries are just a few examples of these continuous streams of user activities. The value of these streams relies in their freshness and relatedness to on-going events. Modern applications consuming these streams need to extract behaviour patterns that can be obtained by aggregating and mining statically and dynamically huge event histories. An event is the notification that a happening of interest has occurred. Event streams must be combined or aggregated to produce more meaningful information. By combining and aggregating them either from multiple producers, or from a single one during a given period of time, a limited set of events describing meaningful situations may be notified to consumers. Event streams with their volume and continuous production cope mainly with two of the characteristics given to Big Data by the 5V’s model: volume & velocity. Techniques such as complex pattern detection, event correlation, event aggregation, event mining and stream processing, have been used for composing events. Nevertheless, to the best of our knowledge, few approaches integrate different composition techniques (online and post-mortem) for dealing with Big Data velocity. This chapter gives an analytical overview of event stream processing and composition approaches: complex event languages, services and event querying systems on distributed logs. Our analysis underlines the challenges introduced by Big Data velocity and volume and use them as reference for identifying the scope and limitations of results stemming from different disciplines: networks, distributed systems, stream databases, event composition services, and data mining on traces

    Metadata and provenance management

    Get PDF
    Scientists today collect, analyze, and generate TeraBytes and PetaBytes of data. These data are often shared and further processed and analyzed among collaborators. In order to facilitate sharing and data interpretations, data need to carry with it metadata about how the data was collected or generated, and provenance information about how the data was processed. This chapter describes metadata and provenance in the context of the data lifecycle. It also gives an overview of the approaches to metadata and provenance management, followed by examples of how applications use metadata and provenance in their scientific processes

    A review of experiences with reliable multicast

    Get PDF
