42 research outputs found

    Maintaining consistency in client-server database systems with client-side caching

    Get PDF
    PhD ThesisCaching has been used in client-server database systems to improve the performance of applications. Much of the current work has concentrated on caching techniques at the server side, since the underlying assumption has been that clients are “thin” with application level processing taking place mainly at the server side. There are also a new class of “thick client” applications where clients need to access the database at the server but also perform substantial amount of processing at the client side; here client-side caching is needed to provide good performance for applications. This thesis presents a transactional cache consistency scheme suitable for systems with client-side caching. The scheme is based on the optimistic approach to concurrency control. The scheme provides serializability for committed transactions. This is in contrast to many modern systems that only provide the snapshot isolation property which is weaker than serializability. A novel feature is that the processing load for validating transactions at commit time is shared between clients and the database server, thereby reducing the load at the server. Read-only transactions can be validated at the client-side, without communicating with the server. Another feature is that the scheme permits disconnected operation, allowing clients with cached objects to work offline. The performance of the scheme is evaluated using simulation experiments. The experiments demonstrate that for mostly read only transaction load – for which caching is most effective - the scheme outperforms the existing concurrency control scheme with client-side caching considered to be the best, and matches the performance of the widely used scheme that only provides snapshot isolation. The results also show that the scheme in a disconnected environment provides reasonable performance.Directorate General of Higher Education, Ministry of National Education, Indonesia

    Application-level caching with transactional consistency

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from PDF version of thesis.Includes bibliographical references (p. 147-159).Distributed in-memory application data caches like memcached are a popular solution for scaling database-driven web sites. These systems increase performance significantly by reducing load on both the database and application servers. Unfortunately, such caches present two challenges for application developers. First, they cannot ensure that the application sees a consistent view of the data within a transaction, violating the isolation properties of the underlying database. Second, they leave the application responsible for locating data in the cache and keeping it up to date, a frequent source of application complexity and programming errors. This thesis addresses both of these problems in a new cache called TxCache. TxCache is a transactional cache: it ensures that any data seen within a transaction, whether from the cache or the database, reflects a slightly stale but consistent snapshot of the database. TxCache also offers a simple programming model. Application developers simply designate certain functions as cacheable, and the system automatically caches their results and invalidates the cached data as the underlying database changes. Our experiments found that TxCache can substantially increase the performance of a web application: on the RUBiS benchmark, it increases throughput by up to 5.2x relative to a system without caching. More importantly, on this application, TxCache achieves performance comparable (within 5%) to that of a non-transactional cache, showing that consistency does not have to come at the price of performance.by Dan R. K. Ports.Ph.D

    WRITE-INTENSIVE DATA MANAGEMENT IN LOG-STRUCTURED STORAGE

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Cost- and workload-driven data management in the cloud

    Get PDF
    This thesis deals with the challenge of finding the right balance between consistency, availability, latency and costs, captured by the CAP/PACELC trade-offs, in the context of distributed data management in the Cloud. At the core of this work, cost and workload-driven data management protocols, called CCQ protocols, are developed. First, this includes the development of C3, which is an adaptive consistency protocol that is able to adjust consistency at runtime by considering consistency and inconsistency costs. Second, the development of Cumulus, an adaptive data partitioning protocol, that can adapt partitions by considering the application workload so that expensive distributed transactions are minimized or avoided. And third, the development of QuAD, a quorum-based replication protocol, that constructs the quorums in such a way so that, given a set of constraints, the best possible performance is achieved. The behavior of each CCQ protocol is steered by a cost model, which aims at reducing the costs and overhead for providing the desired data management guarantees. The CCQ protocols are able to continuously assess their behavior, and if necessary to adapt the behavior at runtime based on application workload and the cost model. This property is crucial for applications deployed in the Cloud, as they are characterized by a highly dynamic workload, and high scalability and availability demands. The dynamic adaptation of the behavior at runtime does not come for free, and may generate considerable overhead that might outweigh the gain of adaptation. The CCQ cost models incorporate a control mechanism, which aims at avoiding expensive and unnecessary adaptations, which do not provide any benefits to applications. The adaptation is a distributed activity that requires coordination between the sites in a distributed database system. The CCQ protocols implement safe online adaptation approaches, which exploit the properties of 2PC and 2PL to ensure that all sites behave in accordance with the cost model, even in the presence of arbitrary failures. It is crucial to guarantee a globally consistent view of the behavior, as in contrary the effects of the cost models are nullified. The presented protocols are implemented as part of a prototypical database system. Their modular architecture allows for a seamless extension of the optimization capabilities at any level of their implementation. Finally, the protocols are quantitatively evaluated in a series of experiments executed in a real Cloud environment. The results show their feasibility and ability to reduce application costs, and to dynamically adjust the behavior at runtime without violating their correctness

    A Survey of Traditional and Practical Concurrency Control in Relational Database Management Systems

    Get PDF
    Traditionally, database theory has focused on concepts such as atomicity and serializability, asserting that concurrent transaction management must enable correctness above all else. Textbooks and academic journals detail a vision of unbounded rationality, where reduced throughput because of concurrency protocols is not of tremendous concern. This thesis seeks to survey the traditional basis for concurrency in relational database management systems and contrast that with actual practice. SQL-92, the current standard for concurrency in relational database management systems has defined isolation, or allowable concurrency levels, and these are examined. Some ways in which DB2, a popular database, interprets these levels and finesses extra concurrency through performance enhancement are detailed. SQL-92 standardizes de facto relational database management systems features. Given this and a superabundance of articles in professional journals detailing steps for fine-tuning transaction concurrency, the expansion of performance tuning seems bright, even at the expense of serializabilty. Are the practical changes wrought by non-academic professionals killing traditional database concurrency ideals? Not really. Reasoned changes for performance gains advocate compromise, using complex concurrency controls when necessary for the job at hand and relaxing standards otherwise. The idea of relational database management systems is only twenty years old, and standards are still evolving. Is there still an interplay between tradition and practice? Of course. Current practice uses tradition pragmatically, not idealistically. Academic ideas help drive the systems available for use, and perhaps current practice now will help academic ideas define concurrency control concepts for relational database management systems

    Rethinking serializable multiversion concurrency control

    Full text link
    Multi-versioned database systems have the potential to significantly increase the amount of concurrency in transaction processing because they can avoid read-write conflicts. Unfortunately, the increase in concurrency usually comes at the cost of transaction serializability. If a database user requests full serializability, modern multi-versioned systems significantly constrain read-write concurrency among conflicting transactions and employ expensive synchronization patterns in their design. In main-memory multi-core settings, these additional constraints are so burdensome that multi-versioned systems are often significantly outperformed by single-version systems. We propose Bohm, a new concurrency control protocol for main-memory multi-versioned database systems. Bohm guarantees serializable execution while ensuring that reads never block writes. In addition, Bohm does not require reads to perform any book-keeping whatsoever, thereby avoiding the overhead of tracking reads via contended writes to shared memory. This leads to excellent scalability and performance in multi-core settings. Bohm has all the above characteristics without performing validation based concurrency control. Instead, it is pessimistic, and is therefore not prone to excessive aborts in the presence of contention. An experimental evaluation shows that Bohm performs well in both high contention and low contention settings, and is able to dramatically outperform state-of-the-art multi-versioned systems despite maintaining the full set of serializability guarantees

    New Concepts for Virtual Testbeds : Data Mining Algorithms for Blackbox Optimization based on Wait-Free Concurrency and Generative Simulation

    Get PDF
    Virtual testbeds have emerged as a key technology for improving and streamlining complex engineering processes by delivering long-term simulation and assessment of complex designs in virtual environments. In contrast to existing simulation technology, virtual testbeds focus on long-term physically-based simulation of the overall design in its (virtual) environment instead of only focussing on isolated, specific parts for short periods of time. This technology has the major advantage that costly testing, prototyping, and assessment in real-life environments are replaced by a cost-efficient simulation in virtual worlds for comprehensive and long-term analysis of designs. For this purpose, engineering models and their requirements are abstracted into software simulation models and objectives which are executed in virtual assessments. Simulation models are used to predict complex, real systems which can be further a subject to random influences. These predictions are used to examine the effects of individual configuration alternatives without actually realizing them and causing possible negative effects on the real system. Virtual testbeds further offer engineers the opportunity to immersively and naturally interact with their simulation model in these virtual assessments. This enables a greater and comprehensive understanding of possible design flaws early-on in the design process for engineers because they can directly assess their design in the virtual environment, based on the simulation objectives. The fact that virtual testbeds enable these realtime interactive virtual assessments, makes their underlying software infrastructure very complex. One major challenge is to minimize the development time of virtual testbeds in order to efficiently integrate them into the overall engineering process. Usually, this can be achieved by minimizing the underlying concurrency of the testbed and by simplifying its software architecture. However, this may result in a degradation of their very concurrent and asynchronous behavior, which is usually required for immersive and natural virtual interaction. A major goal of virtual testbeds in the engineering process is to find a set of optimal configurations of the simulation model which maximizes all simulation objectives for the specified virtual assessments. Once such a set has been computed, engineers can interactively explore it in the virtual environment. The main challenge is that sophisticated simulation models and their configuration are subject to a multiobjective optimization problem, which usually can not be solved manually by engineers or simulation analysts in feasible time. This is further aggravated because the relationships between simulation model configurations and simulation objectives are mostly unknown, leading to what is known as blackbox simulations. In this thesis, I propose novel data mining algorithms for computing Pareto optimal simulation model configurations, based on an approximation of the feasible design space, for deterministic and stochastic blackbox simulations in virtual testbeds for achieving above stated goal. These novel data mining algorithms lead to an automatic knowledge discovery process that does not need any supervision for its data analysis and assessment for multiobjective optimization problems of simulation model configurations. This achieves the previously stated goal of computing optimal configurations of simulation models for long-term simulations and assessments. Furthermore, I propose two complementary solutions for efficiently integrating massively-parallel virtual testbeds into engineering processes. First, I propose a novel multiversion wait-free data and concurrency management based on hash maps. These wait-free hash maps do not require any standard locking mechanisms and enable low-latency data generation, management and distribution for massively-parallel applications. Second, I propose novel concepts for efficiently code generating above wait-free data and concurrency management for arbitrary massively-parallel simulation applications of virtual testbeds. My generative simulation concept combines a state-of-the-art realtime interactive system design pattern for high maintainability with template code generation based on domain specific modelling. This concept is able to generate massively-parallel simulations and, at the same time, model checks its internal dataflow for possible interface errors. These generative concept overcomes the challenge of efficiently integrating virtual testbeds into engineering processes. These contributions enable for the first time a powerful collaboration between simulation, optimization, visualization and data analysis for novel virtual testbed applications but also overcome and achieve the presented challenges and goals

    Mermera: Non-Coherent Distributed Shared Memory for Parallel Computing

    Full text link
    The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is coherent in that all processes that use it agree on the order of all memory events. This dissertation explores the possibility of a significant improvement in the performance of some applications when they use non-coherent memory. First, a new formal model to describe existing non-coherent memories is developed. I use this model to prove that certain problems can be solved using asynchronous iterative algorithms on shared-memory in which the coherence constraints are substantially relaxed. In the course of the development of the model I discovered a new type of non-coherent behavior called Local Consistency. Second, a programming model, Mermera, is proposed. It provides programmers with a choice of hierarchically related non-coherent behaviors along with one coherent behavior. Thus, one can trade-off the ease of programming with coherent memory for improved performance with non-coherent memory. As an example, I present a program to solve a linear system of equations using an asynchronous iterative algorithm. This program uses all the behaviors offered by Mermera. Third, I describe the implementation of Mermera on a BBN Butterfly TC2000 and on a network of workstations. The performance of a version of the equation solving program that uses all the behaviors of Mermera is compared with that of a version that uses coherent behavior only. For a system of 1000 equations the former exhibits at least a 5-fold improvement in convergence time over the latter. The version using coherent behavior only does not benefit from employing more than one workstation to solve the problem while the program using non-coherent behavior continues to achieve improved performance as the number of workstations is increased from 1 to 6. This measurement corroborates our belief that non-coherent shared memory can be a performance boon for some applications
    corecore