4,309 research outputs found

    Extending Eventually Consistent Cloud Databases for Enforcing Numeric Invariants

    Get PDF
    Geo-replicated databases often operate under the principle of eventual consistency to offer high-availability with low latency on a simple key/value store abstraction. Recently, some have adopted commutative data types to provide seamless reconciliation for special purpose data types, such as counters. Despite this, the inability to enforce numeric invariants across all replicas still remains a key shortcoming of relying on the limited guarantees of eventual consistency storage. We present a new replicated data type, called bounded counter, which adds support for numeric invariants to eventually consistent geo-replicated databases. We describe how this can be implemented on top of existing cloud stores without modifying them, using Riak as an example. Our approach adapts ideas from escrow transactions to devise a solution that is decentralized, fault-tolerant and fast. Our evaluation shows much lower latency and better scalability than the traditional approach of using strong consistency to enforce numeric invariants, thus alleviating the tension between consistency and availability

    A systems thinking approach to business intelligence solutions based on cloud computing

    Get PDF
    Thesis (S.M. in System Design and Management)--Massachusetts Institute of Technology, Engineering Systems Division, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 73-74).Business intelligence is the set of tools, processes, practices and people that are used to take advantage of information to support decision making in the organizations. Cloud computing is a new paradigm for offering computing resources that work on demand, are scalable and are charged by the time they are used. Organizations can save large amounts of money and effort using this approach. This document identifies the main challenges companies encounter while working on business intelligence applications in the cloud, such as security, availability, performance, integration, regulatory issues, and constraints on network bandwidth. All these challenges are addressed with a systems thinking approach, and several solutions are offered that can be applied according to the organization's needs. An evaluations of the main vendors of cloud computing technology is presented, so that business intelligence developers identify the available tools and companies they can depend on to migrate or build applications in the cloud. It is demonstrated how business intelligence applications can increase their availability with a cloud computing approach, by decreasing the mean time to recovery (handled by the cloud service provider) and increasing the mean time to failure (achieved by the introduction of more redundancy on the hardware). Innovative mechanisms are discussed in order to improve cloud applications, such as private, public and hybrid clouds, column-oriented databases, in-memory databases and the Data Warehouse 2.0 architecture. Finally, it is shown how the project management for a business intelligence application can be facilitated with a cloud computing approach. Design structure matrices are dramatically simplified by avoiding unnecessary iterations while sizing, validating, and testing hardware and software resources.by Eumir P. Reyes.S.M.in System Design and Managemen

    STAMP: Extensions to the STADEN sequence analysis package for high throughput interactive microsatellite marker design

    Get PDF
    BackgroundMicrosatellites (MSs) are DNA markers with high analytical power, which are widely used in population genetics, genetic mapping, and forensic studies. Currently available software solutions for high-throughput MS design (i) have shortcomings in detecting and distinguishing imperfect and perfect MSs, (ii) lack often necessary interactive design steps, and (iii) do not allow for the development of primers for multiplex amplifications. We present a set of new tools implemented as extensions to the STADEN package, which provides the backbone functionality for flexible sequence analysis workflows. The possibility to assemble overlapping reads into unique contigs (provided by the base functionality of the STADEN package) is important to avoid developing redundant markers, a feature missing from most other similar tools.ResultsOur extensions to the STADEN package provide the following functionality to facilitate microsatellite (and also minisatellite) marker design: The new modules (i) integrate the state-of-the-art tandem repeat detection and analysis software PHOBOS into workflows, (ii) provide two separate repeat detection steps with different search criteria one for masking repetitive regions during assembly of sequencing reads and the other for designing repeat-flanking primers for MS candidate loci, (iii) incorporate the widely used primer design program PRIMER3 into STADEN workflows, enabling the interactive design and visualization of flanking primers for microsatellites, and (iv) provide the functionality to find optimal locus- and primer pair combinations for multiplex primer design. Furthermore, our extensions include a module for storing analysis results in an SQLite database, providing a transparent solution for data access from within as well as from outside of the STADEN Package.ConclusionThe STADEN package is enhanced by our modules into a highly flexible, high-throughput, interactive tool for conventional and multiplex microsatellite marker design. It gives the user detailed control over the workflow, enabling flexible combinations of manual and automated analysis steps. The software is available under the OpenBSD License [1,2]. The high efficiency of our automated marker design workflow has been confirmed in three microsatellite development projects

    An In-Depth Investigation of Performance Characteristics of Hyperledger Fabric

    Get PDF
    Private permissioned blockchains, such as Hyperledger Fabric, are widely deployed across the industry to facilitate cross-organizational processes and promise improved performance compared to their public counterparts. However, the lack of empirical and theoretical results prevent precise prediction of the real-world performance. We address this gap by conducting an in-depth performance analysis of Hyperledger Fabric. The paper presents a detailed compilation of various performance characteristics using an enhanced version of the Distributed Ledger Performance Scan. Researchers and practitioners alike can use the results as guidelines to better configure and implement their blockchains and utilize the DLPS framework to conduct their measurements

    RDMA mechanisms for columnar data in analytical environments

    Get PDF
    Dissertação de mestrado integrado em Engenharia InformáticaThe amount of data in information systems is growing constantly and, as a consequence, the complexity of analytical processing is greater. There are several storage solutions to persist this information, with different architectures targeting different use cases. For analytical processing, storage solutions with a column-oriented format are particularly relevant due to the convenient placement of the data in persistent storage and the closer mapping to in-memory processing. The access to the database is typically remote and has overhead associated, mainly when it is necessary to obtain the same data multiple times. Thus, it is desirable to have a cache on the processing side and there are solutions for this. The problem with the existing so lutions is the overhead introduced by network latency and memory-copy between logical layers. Remote Direct Memory Access (RDMA) mechanisms have the potential to help min imize this overhead. Furthermore, this type of mechanism is indicated for large amounts of data because zero-copy has more impact as the data volume increases. One of the problems associated with RDMA mechanisms is the complexity of development. This complexity is induced by its different development paradigm when compared to other network commu nication protocols, for example, TCP. Aiming to improve the efficiency of analytical processing, this dissertation presents a dis tributed cache that takes advantage of RDMA mechanisms to improve analytical processing performance. The cache abstracts the intricacies of RDMA mechanisms and is developed as a middleware making it transparent to take advantage of this technology. Moreover, this technique could be used in other contexts where a distributed cache makes sense, such as a set of replicated web servers that access the same database.A quantidade de informação nos sistemas informáticos tem vindo a aumentar e consequentemente, a complexidade do processamento analítico torna-se maior. Existem diversas soluções para o armazenamento de dados com diferentes arquiteturas e indicadas para determinados casos de uso. Num contexto de processamento analítico, uma solução com o modelo de dados colunar e especialmente relevante devido à disposição conveniente dos dados em disco e a sua proximidade com o mapeamento em memória desses mesmos dados. Muitas vezes, o acesso aos dados é feito remotamente e isso traz algum overhead, principalmente quando é necessário aceder aos mesmos dados mais do que uma vez. Posto isto, é vantajoso fazer caching dos dados e já existem soluções para esse efeito. O overhead introduzido pela latência da rede e cópia de buffers entre camadas lógicas é o principal problema das soluções existentes. Os mecanismos de acesso direto à memória remota (RDMA - Remote Direct Memory Access) tem o potencial de melhorar o desempenho neste cenário. Para além disso, este tipo de tecnologia faz sentido em sistemas com grandes quantidades de dados, nos quais o acesso direto pode ter um impacto ainda maior por ser zero-copy. Um dos problemas associados com mecanismos RDMA é a complexidade de desenvolvimento. Esta complexidade é causada pelo paradigma de desenvolvimento completamente diferente de outros protocolos de comunicação, como por exemplo, TCP. Tendo em vista melhorar a eficiência do processamento analítico, esta dissertação propõe uma solução de cache distribuída que tira partido de mecanismos de acesso direto a memoria remota (RDMA). A cache abstrai as particularidades dos mecanismos RDMA e é disponibilizada como middleware, tornando a utilização desta tecnologia completamente transparente. Esta solução visa os sistemas de processamento analítico, mas poderá ser utilizada noutros contextos em que uma cache distribuída faça sentido, como por exemplo num conjunto de servidores web replicados que acedem a mesma base de dados

    Temperature-driven shifts in the epibiotic bacterial community composition of the brown macroalgaFucus vesiculosus

    Get PDF
    The thallus surface of the brown macroalga Fucus vesiculosus is covered by a specific biofilm community. This biofilm supposedly plays an important role in the interaction between host and environment. So far, we know little about compositional or functional shifts of this epibiotic bacterial community under changing environmental conditions. In this study, the response of the microbiota to different temperatures with respect to cell density and community composition was analyzed by nonculture-based methods (denaturing gradient gel electrophoresis and 454 pyrosequencing of the 16S rRNA gene). Redundancy analysis showed that despite high variability among host individuals temperature accounted for 20% of the variation in the bacterial community composition, whereas cell density did not differ between groups. Across all samples, 4341 bacterial operational taxonomic units (OTUs) at a 97% similarity level were identified. Eight percent of OTUs were significantly correlated with low, medium, and high temperatures. Notably, the family Rhodobacteraceae increased in relative abundance from 20% to 50% with increasing temperature. OTU diversity (evenness and richness) was higher at 15°C than at the lower and higher temperatures. Considering their known and presumed ecological functions for the host, change in the epibacterial community may entail shifts in the performance of the host alga

    Proceedings of the real-time database workshop, Eindhoven, 23 February 1995

    Get PDF
    corecore