7,742 research outputs found
DKVF: A Framework for Rapid Prototyping and Evaluating Distributed Key-value Stores
We present our framework DKVF that enables one to quickly prototype and
evaluate new protocols for key-value stores and compare them with existing
protocols based on selected benchmarks. Due to limitations of CAP theorem, new
protocols must be developed that achieve the desired trade-off between
consistency and availability for the given application at hand. Hence, both
academic and industrial communities focus on developing new protocols that
identify a different (and hopefully better in one or more aspect) point on this
trade-off curve. While these protocols are often based on a simple intuition,
evaluating them to ensure that they indeed provide increased availability,
consistency, or performance is a tedious task. Our framework, DKVF, enables one
to quickly prototype a new protocol as well as identify how it performs
compared to existing protocols for pre-specified benchmarks. Our framework
relies on YCSB (Yahoo! Cloud Servicing Benchmark) for benchmarking. We
demonstrate DKVF by implementing four existing protocols --eventual
consistency, COPS, GentleRain and CausalSpartan-- with it. We compare the
performance of these protocols against different loading conditions. We find
that the performance is similar to our implementation of these protocols from
scratch. And, the comparison of these protocols is consistent with what has
been reported in the literature. Moreover, implementation of these protocols
was much more natural as we only needed to translate the pseudocode into Java
(and add the necessary error handling). Hence, it was possible to achieve this
in just 1-2 days per protocol. Finally, our framework is extensible. It is
possible to replace individual components in the framework (e.g., the storage
component)
A policy language definition for provenance in pervasive computing
Recent advances in computing technology have led to the paradigm of pervasive computing, which provides a means of simplifying daily life by integrating information processing into the everyday physical world. Pervasive computing draws its power from knowing the surroundings and creates an environment which combines computing and communication capabilities. Sensors that provide high-resolution spatial and instant measurement are most commonly used for forecasting, monitoring and real-time environmental modelling. Sensor data generated by a sensor network depends on several influences, such as the configuration and location of the sensors or the processing performed on the raw measurements. Storing sufficient metadata that gives meaning to the recorded observation is important in order to draw accurate conclusions or to enhance the reliability of the result dataset that uses this automatically collected data. This kind of metadata is called provenance data, as the origin of the data and the process by which it arrived from its origin are recorded. Provenance is still an exploratory field in pervasive computing and many open research questions are yet to emerge. The context information and the different characteristics of the pervasive environment call for different approaches to a provenance support system.
This work implements a policy language definition that specifies the collecting model for provenance management systems and addresses the challenges that arise with stream data and sensor environments. The structure graph of the proposed model is mapped to the Open Provenance Model in order to facilitating the sharing of provenance data and interoperability with other systems. As provenance security has been recognized as one of the most important components in any provenance system, an access control language has been developed that is tailored to support the special requirements of provenance: fine-grained polices, privacy policies and preferences. Experimental evaluation findings show a reasonable overhead for provenance collecting and a reasonable time for provenance query performance, while a numerical analysis was used to evaluate the storage overhead
A Semantic Consistency Model to Reduce Coordination in Replicated Systems
Large-scale distributed applications need to be available and responsive to satisfy millions
of users, which can be achieved by having data geo-replicated in multiple replicas.
However, a partitioned system cannot sustain availability and consistency at fully.
The usage of weak consistency models might lead to data integrity violations, triggered
by problematic concurrent updates, such as selling twice the last ticket on a flight company
service. To overcome possible conflicts, programmers might opt to apply strong
consistency, which guarantees a total order between operations, while preserving data
integrity. Nevertheless, the illusion of being a non-replicated system affects its availability.
In contrast, weaker notions might be used, such as eventual consistency, that boosts
responsiveness, as operations are executed directly at the source replica and their effects
are propagated to remote replicas in the background. However, this approach might put
data integrity at risk. Current protocols that preserve invariants rely on, at least, causal
consistency, a consistency model that maintains causal dependencies between operations.
In this dissertation, we propose a protocol that includes a semantic consistency model.
This consistency model stands between eventual consistency and causal consistency. We
guarantee better performance comparing with causal consistency, and ensure data integrity.
Through semantic analysis, relying on the static analysis tool CISE3, we manage
to limit the maximum number of dependencies that each operation will have. To support
the protocol, we developed a communication algorithm in a cluster. Additionally,
we present an architecture that uses Akka, an actor-based middleware in which actors
communicate by exchanging messages. This architecture adopts the publish/subscribe
pattern and includes data persistence. We also consider the stability of operations, as well
as a dynamic cluster environment, ensuring the convergence of the replicated state. Finally,
we perform an experimental evaluation regarding the performance of the algorithm
using standard case studies. The evaluation confirms that by relying on semantic analysis,
the system requires less coordination between the replicas than causal consistency,
ensuring data integrity.AplicaçÔes distribuĂdas em larga escala necessitam de estar disponĂveis e de serem responsivas
para satisfazer milhÔes de utilizadores, o que pode ser alcançado através da
geo-replicação dos dados em mĂșltiplas rĂ©plicas.
No entanto, um sistema particionado nĂŁo consegue garantir disponibilidade e consistĂȘncia
na sua totalidade. O uso de modelos de consistĂȘncia fraca pode levar a violaçÔes da
integridade dos dados, originadas por escritas concorrentes problemĂĄticas. Para superar
possĂveis conflitos, os programadores podem optar por aplicar modelos de consistĂȘncia
forte, originando uma ordem total das operaçÔes, assegurando a integridade dos dados.
Em contrapartida, podem ser utilizadas noçÔes mais fracas, como a consistĂȘncia eventual,
que aumenta a capacidade de resposta, uma vez que as operaçÔes são executadas diretamente
na réplica de origem e os seus efeitos são propagados para réplicas remotas. No
entanto, esta abordagem pode colocar em risco a integridade dos dados. Os protocolos
existentes que preservam as invariantes dependem, pelo menos, da consistĂȘncia causal,
um modelo de consistĂȘncia que mantĂ©m as dependĂȘncias causais entre operaçÔes.
Nesta dissertação propomos um protocolo que inclui um modelo de consistĂȘncia semĂąntica.
Este modelo situa-se entre a consistĂȘncia eventual e a consistĂȘncia causal. Garantimos
um melhor desempenho em comparação com a consistĂȘncia causal, e asseguramos
a integridade dos dados. Através de uma anålise semùntica, obtida através da ferramenta
de anĂĄlise estĂĄtica CISE3, conseguimos limitar o nĂșmero de dependĂȘncias de cada operação.
Para suportar o protocolo, desenvolvemos um algoritmo de comunicação entre
um aglomerado de réplicas. Adicionalmente, apresentamos uma arquitetura que utiliza
Akka, um middleware baseado em atores que trocam mensagens entre si. Esta arquitetura
utiliza o padrĂŁo publish/subscribe e inclui a persistĂȘncia dos dados. Consideramos tambĂ©m
a estabilidade das operaçÔes, bem como um ambiente dinùmico de réplicas, assegurando
a convergĂȘncia do estado. Por Ășltimo, apresentamos a avaliação do desempenho do algoritmo
desenvolvido, que confirma que a anålise semùntica das operaçÔes requer menos
coordenação entre as rĂ©plicas que a consistĂȘncia causal
Activity Report 2012. Project-Team RMOD. Analyses and Languages Constructs for Object-Oriented Application Evolution
Activity Report 2012 Project-Team RMOD Analyses and Languages Constructs for Object-Oriented Application Evolutio
Big continuous data: dealing with velocity by composing event streams
International audienceThe rate at which we produce data is growing steadily, thus creating even larger streams of continuously evolving data. Online news, micro-blogs, search queries are just a few examples of these continuous streams of user activities. The value of these streams relies in their freshness and relatedness to on-going events. Modern applications consuming these streams need to extract behaviour patterns that can be obtained by aggregating and mining statically and dynamically huge event histories. An event is the notification that a happening of interest has occurred. Event streams must be combined or aggregated to produce more meaningful information. By combining and aggregating them either from multiple producers, or from a single one during a given period of time, a limited set of events describing meaningful situations may be notified to consumers. Event streams with their volume and continuous production cope mainly with two of the characteristics given to Big Data by the 5Vâs model: volume & velocity. Techniques such as complex pattern detection, event correlation, event aggregation, event mining and stream processing, have been used for composing events. Nevertheless, to the best of our knowledge, few approaches integrate different composition techniques (online and post-mortem) for dealing with Big Data velocity. This chapter gives an analytical overview of event stream processing and composition approaches: complex event languages, services and event querying systems on distributed logs. Our analysis underlines the challenges introduced by Big Data velocity and volume and use them as reference for identifying the scope and limitations of results stemming from different disciplines: networks, distributed systems, stream databases, event composition services, and data mining on traces
Metadata and provenance management
Scientists today collect, analyze, and generate TeraBytes and PetaBytes of
data. These data are often shared and further processed and analyzed among
collaborators. In order to facilitate sharing and data interpretations, data
need to carry with it metadata about how the data was collected or generated,
and provenance information about how the data was processed. This chapter
describes metadata and provenance in the context of the data lifecycle. It also
gives an overview of the approaches to metadata and provenance management,
followed by examples of how applications use metadata and provenance in their
scientific processes
- âŠ