10 research outputs found

    Evolving NoSQL Databases Without Downtime

    Full text link
    NoSQL databases like Redis, Cassandra, and MongoDB are increasingly popular because they are flexible, lightweight, and easy to work with. Applications that use these databases will evolve over time, sometimes necessitating (or preferring) a change to the format or organization of the data. The problem we address in this paper is: How can we support the evolution of high-availability applications and their NoSQL data online, without excessive delays or interruptions, even in the presence of backward-incompatible data format changes? We present KVolve, an extension to the popular Redis NoSQL database, as a solution to this problem. KVolve permits a developer to submit an upgrade specification that defines how to transform existing data to the newest version. This transformation is applied lazily as applications interact with the database, thus avoiding long pause times. We demonstrate that KVolve is expressive enough to support substantial practical updates, including format changes to RedisFS, a Redis-backed file system, while imposing essentially no overhead in general use and minimal pause times during updates.Comment: Update to writing/structur

    Towards Online and Transactional Relational Schema Transformations

    Get PDF
    In this paper, we want to draw the attention of the database community to the problem of online schema changes: changing the schema of a database without blocking concurrent transactions. We have identified important classes of relational schema transformations that we want to perform online, and we have identified general requirements for the mechanisms that execute these transformations. Using these requirements, we have developed an experiment based on the standard TPC-C benchmark to assess the behaviour of existing systems. We look at PostgreSQL, which does not support online schema changes; MySQL, which supports basic online schema changes; and pt-online-schema-change, which is a tool for MySQL that uses triggers to implement online schema changes. We found that none of the existing systems fulfill our requirements. In particular, existing non-blocking solutions can not maintain the ACID guarantees when composing schema transformations. This leads to intermediate states being exposed to database programs, which are non-trivial to handle correctly. As a solution direction, we propose lazy schema transformations, which can naturally be composed into complex schema transformations that properly guarantee the ACID properties, and which have minimal impact on concurrent transactions

    Online Schema Evolution is (Almost) Free for Snapshot Databases

    Full text link
    Modern database applications often change their schemas to keep up with the changing requirements. However, support for online and transactional schema evolution remains challenging in existing database systems. Specifically, prior work often takes ad hoc approaches to schema evolution with 'patches' applied to existing systems, leading to many corner cases and often incomplete functionality. Applications therefore often have to carefully schedule downtimes for schema changes, sacrificing availability. This paper presents Tesseract, a new approach to online and transactional schema evolution without the aforementioned drawbacks. We design Tesseract based on a key observation: in widely used multi-versioned database systems, schema evolution can be modeled as data modification operations that change the entire table, i.e., data-definition-as-modification (DDaM). This allows us to support schema almost 'for free' by leveraging the concurrency control protocol. By simple tweaks to existing snapshot isolation protocols, on a 40-core server we show that under a variety of workloads, Tesseract is able to provide online, transactional schema evolution without service downtime, and retain high application performance when schema evolution is in progress.Comment: To appear at Proceedings of the 2023 International Conference on Very Large Data Bases (VLDB 2023

    ARIADNE: A NOVEL HIGH AVAILABILITY CLOUD DATA STORE WITH TRANSACTIONAL GUARANTEES

    Get PDF
    Modern cloud data storage services have powerful capabilities for data-sets that can be indexed by a single key -- key-value stores -- and for data-sets that are characterized by multiple attributes (such as Google's BigTable). These data stores have non-ideal overheads, however, when graph data needs to be maintained; overheads are incurred because related (by graph edges) keys are managed in physically different host machines. We propose a new distributed data-storage paradigm, the key-key-value store, which extends the key-value model and significantly reduces these overheads by storing related keys in the same place. We provide a high-level description of our proposed system for storing large-scale, highly interconnected graph data -- such as social networks -- as well as an analysis of our key-key-value system in relation to existing work. In this thesis, we show how our novel data organization paradigm will facilitate improved levels of QoS in large graph data stores. Furthermore, we have built a system with our key-key-value system design -- Ariadne -- that is de-centralized, scalable, lightweight, relational and transactional. Such a system is unique among current systems in that it provides all qualities at once. This system was put to the test in the cloud using a strenuous concurrent workload and compared against the state of the art MySQL Cluster database system. Results show great promise for scalability and more consistent performance across workload types in Ariadne than in MySQL

    Multi-Schema-Version Data Management

    Get PDF

    The Evolution of Cloud Data Architectures: Storage, Compute, and Migration

    Get PDF
    Recent advances in data architectures have shifted from on-premises to the cloud. However, new challenges emerge as data explosion continues to expand at an exponential rate. As a result, my Ph.D. research focuses on addressing the following challenges. First, cloud data-warehouses such as Snowflake, BigQuery, and Redshift often rely on storage systems such as distributed file systems or object stores to store massive amounts of data. The growth of data volumes is accompanied by an increase in the number of objects stored and the amount of metadata such systems must manage. By treating metadata management similar to data management, we built FileScale, an HDFS-based file system that replaces metadata management in HDFS with a three-tiered distributed architecture that incorporates a high throughput, distributed main-memory database system at the lowest layer, along with distributed caching and routing functionality above it. FileScale performs comparably to the single-machine architecture at a small scale, while enabling linear scalability as the file system metadata increases. Second, Function as a Service, or FaaS, is a new type of cloud-computing service that executes code in response to events without the complex infrastructure typically associated with building and launching microservices applications. FaaS offers cloud functions with millisecond billing granularity to be scaled automatically, independently, and instantaneously as needed. We built Flock, the first practical cloud-native SQL query engine that supports event stream processing on FaaS with heterogeneous hardware (x86 and Arm) with the ability to shuffle and aggregate data without requiring a centralized coordinator or remote storage such as Amazon S3. This architecture is more cost-effective than traditional systems, especially for dynamic workloads and continuous queries. Third, Software as a Service, or SaaS, is a method of software product delivery to end-users over the internet and via pay-as-you-go pricing in which the software is centrally hosted and managed by the cloud service provider. Continuous Deployment (CD) in SaaS, an aspect of DevOps, is the increasingly popular practice of frequent, automated deployment of software changes. To realize the benefits of CD, it must be straightforward to deploy updates to both front-end code and the database, even when the database’s schema has changed. Unfortunately, this is where current practices run into difficulty. So we built BullFrog, a PostgreSQL extension that is the first system to use lazy schema migration to support single-step, online schema evolution without downtime, which achieves efficient, exactly-once physical migration of data under contention

    Dynamic Upgrades for High Availability Systems

    Get PDF
    In this thesis I show that it is possible to build general-purpose frameworks for efficient, on-line data transformation in support of flexible system services, especially dynamic software updates (DSU). This approach generalizes some of the ideas from prior work on DSU, making those ideas applicable to more situations. In particular, I generalize DSU's notion of in-memory state transformation---normally used to upgrade run-time state to be consistent with the new software---so that it can be applied to data not necessarily stored in memory, and for services other than DSU. To support this thesis, I present three artifacts. First, I present C-strider, a generic, type-aware C heap traversal framework. C-strider constitutes a flexible, easy-to-use framework with which developers can program reusable services that have a heap traversal at their core, e.g., serialization, profiling, invariant checking, and state transformation (in support of DSU). C-strider supports both parallel and single-threaded heap traversals, and I demonstrate that C-strider requires little programmer effort, and the resulting services are efficient and effective. Second, I present KVolve, a data transformation service for NoSQL databases. KVolve is notable in that transformations are carried out on-line and on-demand, as data is accessed, rather than off-line and all at once, which would reduce service availability. Experiments with on-line upgrades of services using KVolve show little overhead during normal operation, and only brief pauses at update-time. Finally, I present Morpheus, a dynamically updatable software-defined network (SDN) controller. Morpheus' architecture is fundamentally distributed, with each service running as a separate process that accesses a shared KVolve instance. Morpheus can update multiple controller applications without loss of availability or degradation of performance

    Evolution von Meta-Modellen mit sprachbasierten Mustern

    Get PDF

    Multi-Schema-Version Data Management

    Get PDF
    Modern agile software development methods allow to continuously evolve software systems by easily adding new features, fixing bugs, and adapting the software to changing requirements and conditions while it is continuously used by the users. A major obstacle in the agile evolution is the underlying database that persists the software system’s data from day one on. Hence, evolving the database schema requires to evolve the existing data accordingly—at this point, the currently established solutions are very expensive and error-prone and far from agile. In this thesis, we present InVerDa, a multi-schema-version database system to facilitate agile database development. Multi-schema-version database systems provide multiple schema versions within the same database, where each schema version itself behaves like a regular single-schema database. Creating new schema versions is very simple to provide the desired agility for database development. All created schema versions can co-exist and write operations are immediately propagated between schema versions with a best-effort strategy. Developers do not have to implement the propagation logic of data accesses between schema versions by hand, but InVerDa automatically generates it. To facilitate multi-schema-version database systems, we equip developers with a relational complete and bidirectional database evolution language (BiDEL) that allows to easily evolve existing schema versions to new ones. BiDEL allows to express the evolution of both the schema and the data both forwards and backwards in intuitive and consistent operations; the BiDEL evolution scripts are orders of magnitude shorter than implementing the same behavior with standard SQL and are even less likely to be erroneous, since they describe a developer’s intention of the evolution exclusively on the level of tables without further technical details. Having the developers’ intentions explicitly given in the BiDEL scripts further allows to create a new schema version by merging already existing ones. Having multiple co-existing schema versions in one database raises the need for a sophisticated physical materialization. Multi-schema-version database systems provide full data independence, hence the database administrator can choose a feasible materialization, whereby the multi-schema-version database system internally ensures that no data is lost. The search space of possible materializations can grow exponentially with the number of schema versions. Therefore, we present an adviser that releases the database administrator from diving into the complex performance characteristics of multi-schema-version database systems and merely proposes an optimized materialization for a given workload within seconds. Optimized materializations have shown to improve the performance for a given workload by orders of magnitude. We formally guarantee data independence for multi-schema-version database systems. To this end, we show that every single schema version behaves like a regular single-schema database independent of the chosen physical materialization. This important guarantee allows to easily evolve and access the database in agile software development—all the important features of relational databases, such as transaction guarantees, are preserved. To the best of our knowledge, we are the first to realize such a multi-schema-version database system that allows agile evolution of production databases with full support of co-existing schema versions and formally guaranteed data independence
    corecore