3,086 research outputs found

    Blazes: Coordination Analysis for Distributed Programs

    Full text link
    Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this paper we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system, and another using the Bloom declarative language.Comment: Updated to include additional materials from the original technical report: derivation rules, output stream label

    Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams

    Full text link
    Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems (CPS) present novel challenges to Big Data platforms for performing online analytics. Ubiquitous sensors from IoT deployments are able to generate data streams at high velocity, that include information from a variety of domains, and accumulate to large volumes on disk. Complex Event Processing (CEP) is recognized as an important real-time computing paradigm for analyzing continuous data streams. However, existing work on CEP is largely limited to relational query processing, exposing two distinctive gaps for query specification and execution: (1) infusing the relational query model with higher level knowledge semantics, and (2) seamless query evaluation across temporal spaces that span past, present and future events. These allow accessible analytics over data streams having properties from different disciplines, and help span the velocity (real-time) and volume (persistent) dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP) framework that provides domain-aware knowledge query constructs along with temporal operators that allow end-to-end queries to span across real-time and persistent streams. We translate this query model to efficient query execution over online and offline data streams, proposing several optimizations to mitigate the overheads introduced by evaluating semantic predicates and in accessing high-volume historic data streams. The proposed X-CEP query model and execution approaches are implemented in our prototype semantic CEP engine, SCEPter. We validate our query model using domain-aware CEP queries from a real-world Smart Power Grid application, and experimentally analyze the benefits of our optimizations for executing these queries, using event streams from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems, October 27, 201

    Theory and Practice of Transactional Method Caching

    Get PDF
    Nowadays, tiered architectures are widely accepted for constructing large scale information systems. In this context application servers often form the bottleneck for a system's efficiency. An application server exposes an object oriented interface consisting of set of methods which are accessed by potentially remote clients. The idea of method caching is to store results of read-only method invocations with respect to the application server's interface on the client side. If the client invokes the same method with the same arguments again, the corresponding result can be taken from the cache without contacting the server. It has been shown that this approach can considerably improve a real world system's efficiency. This paper extends the concept of method caching by addressing the case where clients wrap related method invocations in ACID transactions. Demarcating sequences of method calls in this way is supported by many important application server standards. In this context the paper presents an architecture, a theory and an efficient protocol for maintaining full transactional consistency and in particular serializability when using a method cache on the client side. In order to create a protocol for scheduling cached method results, the paper extends a classical transaction formalism. Based on this extension, a recovery protocol and an optimistic serializability protocol are derived. The latter one differs from traditional transactional cache protocols in many essential ways. An efficiency experiment validates the approach: Using the cache a system's performance and scalability are considerably improved

    LogBase: A Scalable Log-structured Database System in the Cloud

    Full text link
    Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log and application data incurs write overheads observed in write-heavy environments and hence adversely affects the write throughput and recovery time in the system. In this paper, we introduce LogBase - a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery. LogBase is designed to be dynamically deployed on commodity clusters to take advantage of elastic scaling property of cloud environments. LogBase provides in-memory multiversion indexes for supporting efficient access to data maintained in the log. LogBase also supports transactions that bundle read and write operations spanning across multiple records. We implemented the proposed system and compared it with HBase and a disk-based log-structured record-oriented system modeled after RAMCloud. The experimental results show that LogBase is able to provide sustained write throughput, efficient data access out of the cache, and effective system recovery.Comment: VLDB201

    Serializable Isolation for Snapshot Databases

    Get PDF
    Many popular database management systems implement a multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking. There are well-known anomalies permitted by snapshot isolation that can lead to violations of data consistency by interleaving transactions that would maintain consistency if run serially. Until now, the only way to prevent these anomalies was to modify the applications by introducing explicit locking or artificial update conflicts, following careful analysis of conflicts between all pairs of transactions. This thesis describes a modification to the concurrency control algorithm of a database management system that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary applications, thus providing serializable isolation. The new algorithm preserves the properties that make snapshot isolation attractive, including that readers do not block writers and vice versa. An implementation of the algorithm in a relational database management system is described, along with a benchmark and performance study, showing that the throughput approaches that of snapshot isolation in most cases

    Cache Serializability: Reducing Inconsistency in Edge Transactions

    Full text link
    Read-only caches are widely used in cloud infrastructures to reduce access latency and load on backend databases. Operators view coherent caches as impractical at genuinely large scale and many client-facing caches are updated in an asynchronous manner with best-effort pipelines. Existing solutions that support cache consistency are inapplicable to this scenario since they require a round trip to the database on every cache transaction. Existing incoherent cache technologies are oblivious to transactional data access, even if the backend database supports transactions. We propose T-Cache, a novel caching policy for read-only transactions in which inconsistency is tolerable (won't cause safety violations) but undesirable (has a cost). T-Cache improves cache consistency despite asynchronous and unreliable communication between the cache and the database. We define cache-serializability, a variant of serializability that is suitable for incoherent caches, and prove that with unbounded resources T-Cache implements this new specification. With limited resources, T-Cache allows the system manager to choose a trade-off between performance and consistency. Our evaluation shows that T-Cache detects many inconsistencies with only nominal overhead. We use synthetic workloads to demonstrate the efficacy of T-Cache when data accesses are clustered and its adaptive reaction to workload changes. With workloads based on the real-world topologies, T-Cache detects 43-70% of the inconsistencies and increases the rate of consistent transactions by 33-58%.Comment: Ittay Eyal, Ken Birman, Robbert van Renesse, "Cache Serializability: Reducing Inconsistency in Edge Transactions," Distributed Computing Systems (ICDCS), IEEE 35th International Conference on, June~29 2015--July~2 201

    Explorar performance com Apollo Federation

    Get PDF
    The growing tendency in cloud-hosted computing and availability supported a shift in soft ware architecture to better take advantage of such technological advancements. As Mono lithic Architecture started evolving and maturing, businesses grew their dependency on soft ware solutions which motivated the shift into Microservice Architecture. The same shift is comparable with the evolution of Monolithic GraphQL solutions which, through its growth and evolution, also required a way forward in solving some of its bot tleneck issues. One of the alternatives, already chosen and proven by some enterprises, is GraphQL Federation. Due to its nobility, there is still a lack of knowledge and testing on the performance of GraphQL Federation architecture and what techniques such as caching strategies, batching and execution strategies impact it. This thesis aims to answer this lack of knowledge by first contextualizing the different as pects of GraphQL and GraphQL Federation and investigating the available and documented enterprise scenarios to extract best practices and to better understand how to prepare such performance evaluation. Next, multiple alternatives underwent the Analytic Hierarchy Process to choose the best way to develop a scenario to enable the performance analysis in a standard and structured way. Following this, the alternative base solutions were analysed and compared to deter mine the best fit for the current thesis. Functional and non-functional requirements were collected along with the rest of the design exercise to enhance the solution to be tested for performance. Finally, after the required development and implementation work was documented, the so lution was tested following the Goal Question Metric methodology and utilizing tools such as JMeter, Prometheus and Grafana to collect and visualize the performance data. It was possible to conclude that indeed different caching, batching and execution strategies have an impact on the GraphQL Federation solution. These impacts do shift between positive (improvements in performance) and negative (performance hindered by strategy) for the different tested strategies.A tendência de crescimento da computação cloud-hosted apoiou uma mudança na arquite tura do software para tirar maior proveito desses avanços tecnológicos. Com a evolução e amadurecimento das arquiteturas monolíticas, as empresas aumentaram sua dependência nas soluções software que motivou a mudança e adoção de arquiteturas de micro serviços. O mesmo se verificou com a evolução das soluções monolíticas GraphQL que, com o seu crescimento e evolução, também requeriam soluções para resolver algumas das suas novas complexidades. Uma das alternativas de resolução, já aplicado e provado na indústria, é o GraphQL Federation. Devido ao seu recente lançamento, ainda não existe um conhecimento sólido na performance de uma arquitetura de GraphQL Federation e que técnicas como estratégias de caching, batching e execution tem impacto sobre a mesma. Esta tese tem como intuito responder a esta falha de conhecimento através de, primeira mente, contextualizar os diferentes aspetos de GraphQL e GraphQL Federations com a investigação de casos de aplicação na indústria, para a extração de boas práticas e compreender o necessário ao desenvolvimento de uma avaliação de performance. De seguida, múltiplas alternativas foram sujeitas ao Analytic Hierarchy Process para escolher a melhor forma de desenvolver um cenário/solução necessária a uma análise de performance normalizada e estruturada. Com isto em mente, as duas soluções base foram analisadas e comparadas para determinar a mais adequada a esta tese. Requisitos funcionais e não funcionais foram recolhidos, assim como todo o restante exercício de design necessário ao desenvolvimento da solução para testes de performance. Finalmente, após a fase de desenvolvimento ser concluída e devidamente documentada, a solução foi testada seguindo a metodologia Goal Question Metric, e aplicando ferramentas como JMeter, Prometheus e Grafana para recolher e visualizar os dados de performance. Foi possível concluir que, de facto, as diferentes estratégias de caching, batching e execution tem impacto numa solução GraphQL Federation. Tais impactos variam entre positivos (com melhorias em termos de performance) e negatives (performance afetada por estratégias) para as diferentes estratégias testadas
    corecore