369 research outputs found

    A Survey of Traditional and Practical Concurrency Control in Relational Database Management Systems

    Get PDF
    Traditionally, database theory has focused on concepts such as atomicity and serializability, asserting that concurrent transaction management must enable correctness above all else. Textbooks and academic journals detail a vision of unbounded rationality, where reduced throughput because of concurrency protocols is not of tremendous concern. This thesis seeks to survey the traditional basis for concurrency in relational database management systems and contrast that with actual practice. SQL-92, the current standard for concurrency in relational database management systems has defined isolation, or allowable concurrency levels, and these are examined. Some ways in which DB2, a popular database, interprets these levels and finesses extra concurrency through performance enhancement are detailed. SQL-92 standardizes de facto relational database management systems features. Given this and a superabundance of articles in professional journals detailing steps for fine-tuning transaction concurrency, the expansion of performance tuning seems bright, even at the expense of serializabilty. Are the practical changes wrought by non-academic professionals killing traditional database concurrency ideals? Not really. Reasoned changes for performance gains advocate compromise, using complex concurrency controls when necessary for the job at hand and relaxing standards otherwise. The idea of relational database management systems is only twenty years old, and standards are still evolving. Is there still an interplay between tradition and practice? Of course. Current practice uses tradition pragmatically, not idealistically. Academic ideas help drive the systems available for use, and perhaps current practice now will help academic ideas define concurrency control concepts for relational database management systems

    Transactional filesystems

    Get PDF
    Dissertação de Mestrado em Engenharia InformáticaThe task of implementing correct software is not trivial; mainly when facing the need for supporting concurrency. To overcome this difficulty, several researchers proposed the technique of providing the well known database transactional models as an abstraction for existing programming languages, allowing a software programmer to define groups of computations as transactions and benefit from the expectable semantics of the underlying transactional model. Prototypes for this programming model are nowadays made available by many research teams but are still far from perfection due to a considerable number of operational restrictions. Mostly, these restrictions derive from the limitations on the use of input-output functions inside a transaction. These functions are frequently irreversible which disables their compatibility with a transactional engine due to its impossibility to undo their effects in the event of aborting a transaction. However, there is a group of input-output operations that are potentially reversible and that can produce a valuable tool when provided within the transactional programming model explained above: the file system operations. A programming model that would involve in a transaction not only a set of memory operations but also a set of file operations, would allow the software programmer to define algorithms in a much flexible and simple way, reaching greater stability and consistency in each application. In this document we purpose to specify and allow the use of this type of operations inside a transactional programming model, as well as studying the advantages and disadvantages of this approach

    Optimizing recovery protocols for replicated database systems

    Full text link
    En la actualidad, el uso de tecnologías de informacíon y sistemas de cómputo tienen una gran influencia en la vida diaria. Dentro de los sistemas informáticos actualmente en uso, son de gran relevancia los sistemas distribuidos por la capacidad que pueden tener para escalar, proporcionar soporte para la tolerancia a fallos y mejorar el desempeño de aplicaciones y proporcionar alta disponibilidad. Los sistemas replicados son un caso especial de los sistemas distribuidos. Esta tesis está centrada en el área de las bases de datos replicadas debido al uso extendido que en el presente se hace de ellas, requiriendo características como: bajos tiempos de respuesta, alto rendimiento en los procesos, balanceo de carga entre las replicas, consistencia e integridad de datos y tolerancia a fallos. En este contexto, el desarrollo de aplicaciones utilizando bases de datos replicadas presenta dificultades que pueden verse atenuadas mediante el uso de servicios de soporte a mas bajo nivel tales como servicios de comunicacion y pertenencia. El uso de los servicios proporcionados por los sistemas de comunicación de grupos permiten ocultar los detalles de las comunicaciones y facilitan el diseño de protocolos de replicación y recuperación. En esta tesis, se presenta un estudio de las alternativas y estrategias empleadas en los protocolos de replicación y recuperación en las bases de datos replicadas. También se revisan diferentes conceptos sobre los sistemas de comunicación de grupos y sincronia virtual. Se caracterizan y clasifican diferentes tipos de protocolos de replicación con respecto a la interacción o soporte que pudieran dar a la recuperación, sin embargo el enfoque se dirige a los protocolos basados en sistemas de comunicación de grupos. Debido a que los sistemas comerciales actuales permiten a los programadores y administradores de sistemas de bases de datos renunciar en alguna medida a la consistencia con la finalidad de aumentar el rendimiento, es importante determinar el nivel de consistencia necesario. En el caso de las bases de datos replicadas la consistencia está muy relacionada con el nivel de aislamiento establecido entre las transacciones. Una de las propuestas centrales de esta tesis es un protocolo de recuperación para un protocolo de replicación basado en certificación. Los protocolos de replicación de base de datos basados en certificación proveen buenas bases para el desarrollo de sus respectivos protocolos de recuperación cuando se utiliza el nivel de aislamiento snapshot. Para tal nivel de aislamiento no se requiere que los readsets sean transferidos entre las réplicas ni revisados en la fase de cetificación y ya que estos protocolos mantienen un histórico de la lista de writesets que es utilizada para certificar las transacciones, este histórico provee la información necesaria para transferir el estado perdido por la réplica en recuperación. Se hace un estudio del rendimiento del protocolo de recuperación básico y de la versión optimizada en la que se compacta la información a transferir. Se presentan los resultados obtenidos en las pruebas de la implementación del protocolo de recuperación en el middleware de soporte. La segunda propuesta esta basada en aplicar el principio de compactación de la informacion de recuperación en un protocolo de recuperación para los protocolos de replicación basados en votación débil. El objetivo es minimizar el tiempo necesario para transfeir y aplicar la información perdida por la réplica en recuperación obteniendo con esto un protocolo de recuperación mas eficiente. Se ha verificado el buen desempeño de este algoritmo a través de una simulación. Para efectuar la simulación se ha hecho uso del entorno de simulación Omnet++. En los resultados de los experimentos puede apreciarse que este protocolo de recuperación tiene buenos resultados en múltiples escenarios. Finalmente, se presenta la verificación de la corrección de ambos algoritmos de recuperación en el Capítulo 5.Nowadays, information technology and computing systems have a great relevance on our lives. Among current computer systems, distributed systems are one of the most important because of their scalability, fault tolerance, performance improvements and high availability. Replicated systems are a specific case of distributed system. This Ph.D. thesis is centered in the replicated database field due to their extended usage, requiring among other properties: low response times, high throughput, load balancing among replicas, data consistency, data integrity and fault tolerance. In this scope, the development of applications that use replicated databases raises some problems that can be reduced using other fault-tolerant building blocks, as group communication and membership services. Thus, the usage of the services provided by group communication systems (GCS) hides several communication details, simplifying the design of replication and recovery protocols. This Ph.D. thesis surveys the alternatives and strategies being used in the replication and recovery protocols for database replication systems. It also summarizes different concepts about group communication systems and virtual synchrony. As a result, the thesis provides a classification of database replication protocols according to their support to (and interaction with) recovery protocols, always assuming that both kinds of protocol rely on a GCS. Since current commercial DBMSs allow that programmers and database administrators sacrifice consistency with the aim of improving performance, it is important to select the appropriate level of consistency. Regarding (replicated) databases, consistency is strongly related to the isolation levels being assigned to transactions. One of the main proposals of this thesis is a recovery protocol for a replication protocol based on certification. Certification-based database replication protocols provide a good basis for the development of their recovery strategies when a snapshot isolation level is assumed. In that level readsets are not needed in the validation step. As a result, they do not need to be transmitted to other replicas. Additionally, these protocols hold a writeset list that is used in the certification/validation step. That list maintains the set of writesets needed by the recovery protocol. This thesis evaluates the performance of a recovery protocol based on the writeset list tranfer (basic protocol) and of an optimized version that compacts the information to be transferred. The second proposal applies the compaction principle to a recovery protocol designed for weak-voting replication protocols. Its aim is to minimize the time needed for transferring and applying the writesets lost by the recovering replica, obtaining in this way an efficient recovery. The performance of this recovery algorithm has been checked implementing a simulator. To this end, the Omnet++ simulating framework has been used. The simulation results confirm that this recovery protocol provides good results in multiple scenarios. Finally, the correction of both recovery protocols is also justified and presented in Chapter 5.García Muñoz, LH. (2013). Optimizing recovery protocols for replicated database systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/31632TESI

    AcDWH - A patented method for active data warehousing

    Get PDF
    The traditional needs of data warehousing from monthly, weekly or nightly batch processing have evolved to near real-time refreshment cycles of the data, called active data warehousing. While the traditional data warehousing methods have been used to batch load large sets of data in the past, the business need for extremely fresh data in the data warehouse has increased. Previous studies have reviewed different aspects of the process along with the different methods to process data in data warehouses in near real-time fashion. To date, there has been little research of using partitioned staging tables within relational databases, combined with a crafted metadata driven system and parallelized loading processes for active data warehousing. This study provides a throughout description and suitability assessment of the patented AcDWH method for active data warehousing. In addition, this study provides a review and a summary of existing research on the data warehousing area from the era of start of data warehousing in the 1990’s to the year 2020. The review focuses on different parts of the data warehousing process and highlights the differences compared to the AcDWH method. Related to the AcDWH, the usage of partitioned staging tables within a relational database in combination of meta data structures used to manage the system is discussed in detail. In addition, two real-life applications are disclosed and discussed on high level. Potential future extensions to the methodology are discussed, and briefly summarized. The results indicate that the utilization of AcDWH method using parallelized loading pipelines and partitioned staging tables can provide enhanced throughput in the data warehouse loading processes. This is a clear improvement on the study’s field. Previous studies have not been considering using partitioned staging tables in conjunction with the loading processes and pipeline parallelization. Review of existing literature against the AcDWH method together with trial and error -approach show that the results and conclusions of this study are genuine. The results of this study confirm the fact that also technical level inventions within the data warehousing processes have significant contribution to the advance of methodologies. Compared to the previous studies in the field, this study suggests a simple yet novel method to achieve near real-time capabilities in active data warehousing.AcDWH – Patentoitu menetelmä aktiiviseen tietovarastointiin Perinteiset tarpeet tietovarastoinnille kuukausittaisen, viikoittaisen tai yöllisen käsittelyn osalta ovat kehittyneet lähes reaaliaikaista päivitystä vaativaksi aktiiviseksi tietovarastoinniksi. Vaikka perinteisiä menetelmiä on käytetty suurten tietomäärien lataukseen menneisyydessä, liiketoiminnan tarve erittäin ajantasaiselle tiedolle tietovarastoissa on kasvanut. Aikaisemmat tutkimukset ovat tarkastelleet erilaisia prosessin osa-alueita sekä erilaisia menetelmiä tietojen käsittelyyn lähes reaaliaikaisissa tietovarastoissa. Tutkimus partitioitujen relaatiotietokantojen väliaikaistaulujen käytöstä aktiivisessa tietovarastoinnissa yhdessä räätälöidyn metatieto-ohjatun järjestelmän ja rinnakkaislatauksen kanssa on ollut kuitenkin vähäistä. Tämä tutkielma tarjoaa kattavan kuvauksen sekä arvioinnin patentoidun AcDWH-menetelmän käytöstä aktiivisessa tietovarastoinnissa. Työ sisältää katsauksen ja yhteenvedon olemassa olevaan tutkimukseen tietovarastoinnin alueella 1990-luvun alusta vuoteen 2020. Kirjallisuuskatsaus keskittyy eri osa-alueisiin tietovarastointiprosessissa ja havainnollistaa eroja verrattuna AcDWH-menetelmään. AcDWH-menetelmän osalta käsitellään partitioitujen väliaikaistaulujen käyttöä relaatiotietokannassa, yhdessä järjestelmän hallitsemiseen käytettyjen metatietorakenteiden kanssa. Lisäksi kahden reaalielämän järjestelmän sovellukset kuvataan korkealla tasolla. Tutkimuksessa käsitellään myös menetelmän mahdollisia tulevia laajennuksia menetelmään tiivistetysti. Tulokset osoittavat, että AcDWH-menetelmän käyttö rinnakkaisilla latausputkilla ja partitioitujen välitaulujen käytöllä tarjoaa tehokkaan tietovaraston latausprosessin. Tämä on selvä parannus aikaisempaan tutkimukseen verrattuna. Aikaisemmassa tutkimuksessa ei ole käsitelty partitioitujen väliaikaistaulujen käyttöä ja soveltamista latausprosessin rinnakkaistamisessa. Tämän tutkimuksen tulokset vahvistavat, että myös tekniset keksinnöt tietovarastointiprosesseissa ovat merkittävässä roolissa menetelmien kehittymisessä. Aikaisempaan alan tutkimukseen verrattuna tämä tutkimus ehdottaa yksinkertaista mutta uutta menetelmää lähes reaaliaikaisten ominaisuuksien saavuttamiseksi aktiivisessa tietovarastoinnissa

    Improving Reuse of Distributed Transaction Software with Transaction-Aware Aspects

    Get PDF
    Implementing crosscutting concerns for transactions is difficult, even using Aspect-Oriented Programming Languages (AOPLs) such as AspectJ. Many of these challenges arise because the context of a transaction-related crosscutting concern consists of loosely-coupled abstractions like dynamically-generated identifiers, timestamps, and tentative value sets of distributed resources. Current AOPLs do not provide joinpoints and pointcuts for weaving advice into high-level abstractions or contexts, like transaction contexts. Other challenges stem from the essential complexity in the nature of the data, operations on the data, or the volume of data, and accidental complexity comes from the way that the problem is being solved, even using common transaction frameworks. This dissertation describes an extension to AspectJ, called TransJ, with which developers can implement transaction-related crosscutting concerns in cohesive and loosely-coupled aspects. It also presents a preliminary experiment that provides evidence of improvement in reusability without sacrificing the performance of applications requiring essential transactions. This empirical study is conducted using the extended-quality model for transactional application to define measurements on the transaction software systems. This quality model defines three goals: the first relates to code quality (in terms of its reusability); the second to software performance; and the third concerns software development efficiency. Results from this study show that TransJ can improve the reusability while maintaining performance of TransJ applications requiring transaction for all eight areas addressed by the hypotheses: better encapsulation and separation of concern; loose Coupling, higher-cohesion and less tangling; improving obliviousness; preserving the software efficiency; improving extensibility; and hasten the development process

    Transactional concurrency control for resource constrained applications

    Get PDF
    PhD ThesisTransactions have long been used as a mechanism for ensuring the consistency of databases. Databases, and associated transactional approaches, have always been an active area of research as different application domains and computing architectures have placed ever more elaborate requirements on shared data access. As transactions typically provide consistency at the expense of timeliness (abort/retry) and resource (duplicate shared data and locking), there has been substantial efforts to limit these two aspects of transactions while still satisfying application requirements. In environments where clients are geographically distant from a database the consistency/performance trade-off becomes acute as any retrieval of data over a network is not only expensive, but relatively slow compared to co-located client/database systems. Furthermore, for battery powered clients the increased overhead of transactions can also be viewed as a significant power overhead. However, for all their drawbacks transactions do provide the data consistency that is a requirement for many application types. In this Thesis we explore the solution space related to timely transactional systems for remote clients and centralised databases with a focus on providing a solution, that, when compared to other's work in this domain: (a) maintains consistency; (b) lowers latency; (c) improves throughput. To achieve this we revisit a technique first developed to decrease disk access times via local caching of state (for aborted transactions) to tackle the problems prevalent in real-time databases. We demonstrate that such a technique (rerun) allows a significant change in the typical structure of a transaction (one never before considered, even in rerun systems). Such a change itself brings significant performance success not only in the traditional rerun local database solution space, but also in the distributed solution space. A byproduct of our improvements also, one can argue, brings about a "greener" solution as less time coupled with improved throughput affords improved battery life for mobile devices

    Safe code transfromations for speculative execution in real-time systems

    Get PDF
    Although compiler optimization techniques are standard and successful in non-real-time systems, if naively applied, they can destroy safety guarantees and deadlines in hard real-time systems. For this reason, real-time systems developers have tended to avoid automatic compiler optimization of their code. However, real-time applications in several areas have been growing substantially in size and complexity in recent years. This size and complexity makes it impossible for real-time programmers to write optimal code, and consequently indicates a need for compiler optimization. Recently researchers have developed or modified analyses and transformations to improve performance without degrading worst-case execution times. Moreover, these optimization techniques can sometimes transform programs which may not meet constraints/deadlines, or which result in timeouts, into deadline-satisfying programs. One such technique, speculative execution, also used for example in parallel computing and databases, can enhance performance by executing parts of the code whose execution may or may not be needed. In some cases, rollback is necessary if the computation turns out to be invalid. However, speculative execution must be applied carefully to real-time systems so that the worst-case execution path is not extended. Deterministic worst-case execution for satisfying hard real-time constraints, and speculative execution with rollback for improving average-case throughput, appear to lie on opposite ends of a spectrum of performance requirements and strategies. Deterministic worst-case execution for satisfying hard real-time constraints, and speculative execution with rollback for improving average-case throughput, appear to lie on opposite ends of a spectrum of performance requirements and strategies. Nonetheless, this thesis shows that there are situations in which speculative execution can improve the performance of a hard real-time system, either by enhancing average performance while not affecting the worst-case, or by actually decreasing the worst-case execution time. The thesis proposes a set of compiler transformation rules to identify opportunities for speculative execution and to transform the code. Proofs for semantic correctness and timeliness preservation are provided to verify safety of applying transformation rules to real-time systems. Moreover, an extensive experiment using simulation of randomly generated real-time programs have been conducted to evaluate applicability and profitability of speculative execution. The simulation results indicate that speculative execution improves average execution time and program timeliness. Finally, a prototype implementation is described in which these transformations can be evaluated for realistic applications
    corecore