3,086 research outputs found
Blazes: Coordination Analysis for Distributed Programs
Distributed consistency is perhaps the most discussed topic in distributed
systems today. Coordination protocols can ensure consistency, but in practice
they cause undesirable performance unless used judiciously. Scalable
distributed architectures avoid coordination whenever possible, but
under-coordinated systems can exhibit behavioral anomalies under fault, which
are often extremely difficult to debug. This raises significant challenges for
distributed system architects and developers. In this paper we present Blazes,
a cross-platform program analysis framework that (a) identifies program
locations that require coordination to ensure consistent executions, and (b)
automatically synthesizes application-specific coordination code that can
significantly outperform general-purpose techniques. We present two case
studies, one using annotated programs in the Twitter Storm system, and another
using the Bloom declarative language.Comment: Updated to include additional materials from the original technical
report: derivation rules, output stream label
Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams
Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems
(CPS) present novel challenges to Big Data platforms for performing online
analytics. Ubiquitous sensors from IoT deployments are able to generate data
streams at high velocity, that include information from a variety of domains,
and accumulate to large volumes on disk. Complex Event Processing (CEP) is
recognized as an important real-time computing paradigm for analyzing
continuous data streams. However, existing work on CEP is largely limited to
relational query processing, exposing two distinctive gaps for query
specification and execution: (1) infusing the relational query model with
higher level knowledge semantics, and (2) seamless query evaluation across
temporal spaces that span past, present and future events. These allow
accessible analytics over data streams having properties from different
disciplines, and help span the velocity (real-time) and volume (persistent)
dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP)
framework that provides domain-aware knowledge query constructs along with
temporal operators that allow end-to-end queries to span across real-time and
persistent streams. We translate this query model to efficient query execution
over online and offline data streams, proposing several optimizations to
mitigate the overheads introduced by evaluating semantic predicates and in
accessing high-volume historic data streams. The proposed X-CEP query model and
execution approaches are implemented in our prototype semantic CEP engine,
SCEPter. We validate our query model using domain-aware CEP queries from a
real-world Smart Power Grid application, and experimentally analyze the
benefits of our optimizations for executing these queries, using event streams
from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems,
October 27, 201
Theory and Practice of Transactional Method Caching
Nowadays, tiered architectures are widely accepted for constructing large
scale information systems. In this context application servers often form the
bottleneck for a system's efficiency. An application server exposes an object
oriented interface consisting of set of methods which are accessed by
potentially remote clients. The idea of method caching is to store results of
read-only method invocations with respect to the application server's interface
on the client side. If the client invokes the same method with the same
arguments again, the corresponding result can be taken from the cache without
contacting the server. It has been shown that this approach can considerably
improve a real world system's efficiency.
This paper extends the concept of method caching by addressing the case where
clients wrap related method invocations in ACID transactions. Demarcating
sequences of method calls in this way is supported by many important
application server standards. In this context the paper presents an
architecture, a theory and an efficient protocol for maintaining full
transactional consistency and in particular serializability when using a method
cache on the client side. In order to create a protocol for scheduling cached
method results, the paper extends a classical transaction formalism. Based on
this extension, a recovery protocol and an optimistic serializability protocol
are derived. The latter one differs from traditional transactional cache
protocols in many essential ways. An efficiency experiment validates the
approach: Using the cache a system's performance and scalability are
considerably improved
LogBase: A Scalable Log-structured Database System in the Cloud
Numerous applications such as financial transactions (e.g., stock trading)
are write-heavy in nature. The shift from reads to writes in web applications
has also been accelerating in recent years. Write-ahead-logging is a common
approach for providing recovery capability while improving performance in most
storage systems. However, the separation of log and application data incurs
write overheads observed in write-heavy environments and hence adversely
affects the write throughput and recovery time in the system. In this paper, we
introduce LogBase - a scalable log-structured database system that adopts
log-only storage for removing the write bottleneck and supporting fast system
recovery. LogBase is designed to be dynamically deployed on commodity clusters
to take advantage of elastic scaling property of cloud environments. LogBase
provides in-memory multiversion indexes for supporting efficient access to data
maintained in the log. LogBase also supports transactions that bundle read and
write operations spanning across multiple records. We implemented the proposed
system and compared it with HBase and a disk-based log-structured
record-oriented system modeled after RAMCloud. The experimental results show
that LogBase is able to provide sustained write throughput, efficient data
access out of the cache, and effective system recovery.Comment: VLDB201
Serializable Isolation for Snapshot Databases
Many popular database management systems implement a multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking. There are well-known anomalies permitted by snapshot isolation that can lead to violations of data consistency by interleaving transactions that would maintain consistency if run serially. Until now, the only way to prevent these anomalies was to modify the applications by introducing explicit locking or artificial update conflicts, following careful analysis of conflicts between all pairs of transactions. This thesis describes a modification to the concurrency control algorithm of a database management system that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary applications, thus providing serializable isolation. The new algorithm preserves the properties that make snapshot isolation attractive, including that readers do not block writers and vice versa. An implementation of the algorithm in a relational database management system is described, along with a benchmark and performance study, showing that the throughput approaches that of snapshot isolation in most cases
Cache Serializability: Reducing Inconsistency in Edge Transactions
Read-only caches are widely used in cloud infrastructures to reduce access
latency and load on backend databases. Operators view coherent caches as
impractical at genuinely large scale and many client-facing caches are updated
in an asynchronous manner with best-effort pipelines. Existing solutions that
support cache consistency are inapplicable to this scenario since they require
a round trip to the database on every cache transaction.
Existing incoherent cache technologies are oblivious to transactional data
access, even if the backend database supports transactions. We propose T-Cache,
a novel caching policy for read-only transactions in which inconsistency is
tolerable (won't cause safety violations) but undesirable (has a cost). T-Cache
improves cache consistency despite asynchronous and unreliable communication
between the cache and the database. We define cache-serializability, a variant
of serializability that is suitable for incoherent caches, and prove that with
unbounded resources T-Cache implements this new specification. With limited
resources, T-Cache allows the system manager to choose a trade-off between
performance and consistency.
Our evaluation shows that T-Cache detects many inconsistencies with only
nominal overhead. We use synthetic workloads to demonstrate the efficacy of
T-Cache when data accesses are clustered and its adaptive reaction to workload
changes. With workloads based on the real-world topologies, T-Cache detects
43-70% of the inconsistencies and increases the rate of consistent transactions
by 33-58%.Comment: Ittay Eyal, Ken Birman, Robbert van Renesse, "Cache Serializability:
Reducing Inconsistency in Edge Transactions," Distributed Computing Systems
(ICDCS), IEEE 35th International Conference on, June~29 2015--July~2 201
Explorar performance com Apollo Federation
The growing tendency in cloud-hosted computing and availability supported a shift in soft ware architecture to better take advantage of such technological advancements. As Mono lithic Architecture started evolving and maturing, businesses grew their dependency on soft ware solutions which motivated the shift into Microservice Architecture.
The same shift is comparable with the evolution of Monolithic GraphQL solutions which,
through its growth and evolution, also required a way forward in solving some of its bot tleneck issues. One of the alternatives, already chosen and proven by some enterprises, is
GraphQL Federation. Due to its nobility, there is still a lack of knowledge and testing on
the performance of GraphQL Federation architecture and what techniques such as caching
strategies, batching and execution strategies impact it.
This thesis aims to answer this lack of knowledge by first contextualizing the different as pects of GraphQL and GraphQL Federation and investigating the available and documented
enterprise scenarios to extract best practices and to better understand how to prepare such
performance evaluation.
Next, multiple alternatives underwent the Analytic Hierarchy Process to choose the best
way to develop a scenario to enable the performance analysis in a standard and structured
way. Following this, the alternative base solutions were analysed and compared to deter mine the best fit for the current thesis. Functional and non-functional requirements were
collected along with the rest of the design exercise to enhance the solution to be tested for
performance.
Finally, after the required development and implementation work was documented, the so lution was tested following the Goal Question Metric methodology and utilizing tools such
as JMeter, Prometheus and Grafana to collect and visualize the performance data. It was
possible to conclude that indeed different caching, batching and execution strategies have
an impact on the GraphQL Federation solution. These impacts do shift between positive
(improvements in performance) and negative (performance hindered by strategy) for the
different tested strategies.A tendência de crescimento da computação cloud-hosted apoiou uma mudança na arquite tura do software para tirar maior proveito desses avanços tecnológicos. Com a evolução
e amadurecimento das arquiteturas monolíticas, as empresas aumentaram sua dependência
nas soluções software que motivou a mudança e adoção de arquiteturas de micro serviços.
O mesmo se verificou com a evolução das soluções monolíticas GraphQL que, com o seu
crescimento e evolução, também requeriam soluções para resolver algumas das suas novas
complexidades. Uma das alternativas de resolução, já aplicado e provado na indústria, é
o GraphQL Federation. Devido ao seu recente lançamento, ainda não existe um conhecimento sólido na performance de uma arquitetura de GraphQL Federation e que técnicas
como estratégias de caching, batching e execution tem impacto sobre a mesma.
Esta tese tem como intuito responder a esta falha de conhecimento através de, primeira mente, contextualizar os diferentes aspetos de GraphQL e GraphQL Federations com a
investigação de casos de aplicação na indústria, para a extração de boas práticas e compreender o necessário ao desenvolvimento de uma avaliação de performance.
De seguida, múltiplas alternativas foram sujeitas ao Analytic Hierarchy Process para escolher
a melhor forma de desenvolver um cenário/solução necessária a uma análise de performance
normalizada e estruturada. Com isto em mente, as duas soluções base foram analisadas
e comparadas para determinar a mais adequada a esta tese. Requisitos funcionais e não funcionais foram recolhidos, assim como todo o restante exercício de design necessário ao
desenvolvimento da solução para testes de performance.
Finalmente, após a fase de desenvolvimento ser concluída e devidamente documentada, a
solução foi testada seguindo a metodologia Goal Question Metric, e aplicando ferramentas
como JMeter, Prometheus e Grafana para recolher e visualizar os dados de performance.
Foi possível concluir que, de facto, as diferentes estratégias de caching, batching e execution
tem impacto numa solução GraphQL Federation. Tais impactos variam entre positivos (com
melhorias em termos de performance) e negatives (performance afetada por estratégias) para
as diferentes estratégias testadas
- …