1,884 research outputs found

    Model-Based Proactive Read-Validation in Transaction Processing Systems

    Get PDF
    Concurrency control protocols based on read-validation schemes allow transactions which are doomed to abort to still run until a subsequent validation check reveals them as invalid. These late aborts do not favor the reduction of wasted computation and can penalize performance. To counteract this problem, we present an analytical model that predicts the abort probability of transactions handled via read-validation schemes. Our goal is to determine what are the suited points-along a transaction lifetime-to carry out a validation check. This may lead to early aborting doomed transactions, thus saving CPU time. We show how to exploit the abort probability predictions returned by the model in combination with a threshold-based scheme to trigger read-validations. We also show how this approach can definitely improve performance-leading up to 14 % better turnaround-as demonstrated by some experiments carried out with a port of the TPC-C benchmark to Software Transactional Memory

    A Combined Analytical Modeling Machine Learning Approach for Performance Prediction of MapReduce Jobs in Hadoop Clusters

    Get PDF
    Nowadays MapReduce and its open source implementation, Apache Hadoop, are the most widespread solutions for handling massive dataset on clusters of commodity hardware. At the expense of a somewhat reduced performance in comparison to HPC technologies, the MapReduce framework provides fault tolerance and automatic parallelization without any efforts by developers. Since in many cases Hadoop is adopted to support business critical activities, it is often important to predict with fair confidence the execution time of submitted jobs, for instance when SLAs are established with end-users. In this work, we propose and validate a hybrid approach exploiting both queuing networks and support vector regression, in order to achieve a good accuracy without too many costly experiments on a real setup. The experimental results show how the proposed approach attains a 21% improvement in accuracy over applying machine learning techniques without any support from analytical models

    Transactions and data management in NoSQL cloud databases

    Get PDF
    NoSQL databases have become the preferred option for storing and processing data in cloud computing as they are capable of providing high data availability, scalability and efficiency. But in order to achieve these attributes, NoSQL databases make certain trade-offs. First, NoSQL databases cannot guarantee strong consistency of data. They only guarantee a weaker consistency which is based on eventual consistency model. Second, NoSQL databases adopt a simple data model which makes it easy for data to be scaled across multiple nodes. Third, NoSQL databases do not support table joins and referential integrity which by implication, means they cannot implement complex queries. The combination of these factors implies that NoSQL databases cannot support transactions. Motivated by these crucial issues this thesis investigates into the transactions and data management in NoSQL databases. It presents a novel approach that implements transactional support for NoSQL databases in order to ensure stronger data consistency and provide appropriate level of performance. The novelty lies in the design of a Multi-Key transaction model that guarantees the standard properties of transactions in order to ensure stronger consistency and integrity of data. The model is implemented in a novel loosely-coupled architecture that separates the implementation of transactional logic from the underlying data thus ensuring transparency and abstraction in cloud and NoSQL databases. The proposed approach is validated through the development of a prototype system using real MongoDB system. An extended version of the standard Yahoo! Cloud Services Benchmark (YCSB) has been used in order to test and evaluate the proposed approach. Various experiments have been conducted and sets of results have been generated. The results show that the proposed approach meets the research objectives. It maintains stronger consistency of cloud data as well as appropriate level of reliability and performance

    Cloud transactions and caching for improved performance in clouds and DTNs

    Get PDF
    In distributed transactional systems deployed over some massively decentralized cloud servers, access policies are typically replicated. Interdependencies ad inconsistencies among policies need to be addressed as they can affect performance, throughput and accuracy. Several stringent levels of policy consistency constraints and enforcement approaches to guarantee the trustworthiness of transactions on cloud servers are proposed. We define a look-up table to store policy versions and the concept of Tree-Based Consistency approach to maintain a tree structure of the servers. By integrating look-up table and the consistency tree based approach, we propose an enhanced version of Two-phase validation commit (2PVC) protocol integrated with the Paxos commit protocol with reduced or almost the same performance overhead without affecting accuracy and precision. A new caching scheme has been proposed which takes into consideration Military/Defense applications of Delay-tolerant Networks (DTNs) where data that need to be cached follows a whole different priority levels. In these applications, data popularity can be defined not only based on request frequency, but also based on the importance like who created and ranked point of interests in the data, when and where it was created; higher rank data belonging to some specific location may be more important though frequency of those may not be higher than more popular lower priority data. Thus, our caching scheme is designed by taking different requirements into consideration for DTN networks for defense applications. The performance evaluation shows that our caching scheme reduces the overall access latency, cache miss and usage of cache memory when compared to using caching schemes --Abstract, page iv

    High performance data processing

    Get PDF
    Dissertação de mestrado em Informatics EngeneeringÀ medida que as aplicações atingem uma maior quantidade de utilizadores, precisam de processar uma crescente quantidade de pedidos. Para além disso, precisam de muitas vezes satisfazer pedidos de utilizadores de diferentes partes do globo, onde as latências de rede têm um impacto significativo no desempenho em instalações monolíticas. Portanto, distribuição é uma solução muito procurada para melhorar a performance das camadas aplicacional e de dados. Contudo, distribuir dados não é uma tarefa simples se pretendemos assegurar uma forte consistência. Isto leva a que muitos sistemas de base de dados dependam de protocolos de sincronização pesados, como two-phase commit, consenso distribuído, bloqueamento distribuído, entre outros, enquanto que outros sistemas dependem em consistência fraca, não viável para alguns casos de uso. Esta tese apresenta o design, implementação e avaliação de duas soluções que têm como objetivo reduzir o impacto de assegurar garantias de forte consistência em sistemas de base de dados, especialmente aqueles distribuídos pelo globo. A primeira é o Primary Semi-Primary, uma arquitetura de base de dados distribuída com total replicação que permite que as réplicas evoluam independentemente, para evitar que os clientes precisem de esperar que escritas precedentes que não geram conflitos sejam propagadas. Apesar das réplicas poderem processar tanto leituras como escritas, melhorando a escalabilidade, o sistema continua a oferecer garantias de consistência forte, através do envio da certificação de transações para um nó central. O seu design é independente de modelos de dados, mas a sua implementação pode tirar partido do controlo de concorrência nativo oferecido por algumas base de dados, como é mostrado na implementação usando PostgreSQL e o seu Snapshot Isolation. Os resultados apresentam várias vantagens tanto em ambientes locais como globais. A segunda solução são os Multi-Record Values, uma técnica que particiona dinâmicamente valores numéricos em múltiplos registros, permitindo que escritas concorrentes possam executar com uma baixa probabilidade de colisão, reduzindo a taxa de abortos e/ou contenção na adquirição de locks. Garantias de limites inferiores, exigido por objetos como saldos bancários ou inventários, são assegurados por esta estratégia, ao contrário de muitas outras alternativas. O seu design é também indiferente do modelo de dados, sendo que as suas vantagens podem ser encontradas em sistemas SQL e NoSQL, bem como distribuídos ou centralizados, tal como apresentado na secção de avaliação.As applications reach an wider audience that ever before, they must process larger and larger amounts of requests. In addition, they often must be able to serve users all over the globe, where network latencies have a significant negative impact on monolithic deployments. Therefore, distribution is a well sought-after solution to improve performance of both applicational and database layers. However, distributing data is not an easy task if we want to ensure strong consistency guarantees. This leads many databases systems to rely on expensive synchronization controls protocols such as two-phase commit, distributed consensus, distributed locking, among others, while other systems rely on weak consistency, unfeasible for some use cases. This thesis presents the design, implementation and evaluation of two solutions aimed at reducing the impact of ensuring strong consistency guarantees on database systems, especially geo-distributed ones. The first is the Primary Semi-Primary, a full replication distributed database architecture that allows different replicas to evolve independently, to avoid that clients wait for preceding non-conflicting updates. Al though replicas can process both reads and writes, improving scalability, the system still ensures strong consistency guarantees, by relaying transactions’ certifications to a central node. Its design is independent of the underlying data model, but its implementation can take advantage of the native concurrency control offered by some systems, as is exemplified by an implementation using PostgreSQL and its Snapshot Isolation. The results present several advantages in both throughput and response time, when comparing to other alternative architectures, in both local and geo-distributed environments. The second solution is the Multi-Record Values, a technique that dynami cally partitions numeric values into multiple records, allowing concurrent writes to execute with low conflict probability, reducing abort rate and/or locking contention. Lower limit guarantees, required by objects such as balances or stocks, are ensure by this strategy, unlike many other similar alternatives. Its design is also data model agnostic, given its advantages can be found in both SQL and NoSQL systems, as well as both centralized and distributed database, as presented in the evaluation section

    Robustness Against Transactional Causal Consistency

    Get PDF
    Distributed storage systems and databases are widely used by various types of applications. Transactional access to these storage systems is an important abstraction allowing application programmers to consider blocks of actions (i.e., transactions) as executing atomically. For performance reasons, the consistency models implemented by modern databases are weaker than the standard serializability model, which corresponds to the atomicity abstraction of transactions executing over a sequentially consistent memory. Causal consistency for instance is one such model that is widely used in practice. In this paper, we investigate application-specific relationships between several variations of causal consistency and we address the issue of verifying automatically if a given transactional program is robust against causal consistency, i.e., all its behaviors when executed over an arbitrary causally consistent database are serializable. We show that programs without write-write races have the same set of behaviors under all these variations, and we show that checking robustness is polynomial time reducible to a state reachability problem in transactional programs over a sequentially consistent shared memory. A surprising corollary of the latter result is that causal consistency variations which admit incomparable sets of behaviors admit comparable sets of robust programs. This reduction also opens the door to leveraging existing methods and tools for the verification of concurrent programs (assuming sequential consistency) for reasoning about programs running over causally consistent databases. Furthermore, it allows to establish that the problem of checking robustness is decidable when the programs executed at different sites are finite-state
    corecore