    Cricket: A Mapped, Persistent Object Store

    This paper describes Cricket, a new database storage system that is intended to be used as a platform for design environments and persistent programming languages. Cricket uses the memory management primitives of the Mach operating system to provide the abstraction of a shared, transactional single-level store that can be directly accessed by user applications. In this paper, we present the design and motivation for Cricket. We also present some initial performance results which show that, for its intended applications, Cricket can provide better performance than a general-purpose database storage system

    Architectural Principles for Database Systems on Storage-Class Memory

    Database systems have long been optimized to hide the higher latency of storage media, yielding complex persistence mechanisms. With the advent of large DRAM capacities, it became possible to keep a full copy of the data in DRAM. Systems that leverage this possibility, such as main-memory databases, keep two copies of the data in two different formats: one in main memory and the other one in storage. The two copies are kept synchronized using snapshotting and logging. This main-memory-centric architecture yields nearly two orders of magnitude faster analytical processing than traditional, disk-centric ones. The rise of Big Data emphasized the importance of such systems with an ever-increasing need for more main memory. However, DRAM is hitting its scalability limits: It is intrinsically hard to further increase its density. Storage-Class Memory (SCM) is a group of novel memory technologies that promise to alleviate DRAM’s scalability limits. They combine the non-volatility, density, and economic characteristics of storage media with the byte-addressability and a latency close to that of DRAM. Therefore, SCM can serve as persistent main memory, thereby bridging the gap between main memory and storage. In this dissertation, we explore the impact of SCM as persistent main memory on database systems. Assuming a hybrid SCM-DRAM hardware architecture, we propose a novel software architecture for database systems that places primary data in SCM and directly operates on it, eliminating the need for explicit IO. This architecture yields many benefits: First, it obviates the need to reload data from storage to main memory during recovery, as data is discovered and accessed directly in SCM. Second, it allows replacing the traditional logging infrastructure by fine-grained, cheap micro-logging at data-structure level. Third, secondary data can be stored in DRAM and reconstructed during recovery. Fourth, system runtime information can be stored in SCM to improve recovery time. Finally, the system may retain and continue in-flight transactions in case of system failures. However, SCM is no panacea as it raises unprecedented programming challenges. Given its byte-addressability and low latency, processors can access, read, modify, and persist data in SCM using load/store instructions at a CPU cache line granularity. The path from CPU registers to SCM is long and mostly volatile, including store buffers and CPU caches, leaving the programmer with little control over when data is persisted. Therefore, there is a need to enforce the order and durability of SCM writes using persistence primitives, such as cache line flushing instructions. This in turn creates new failure scenarios, such as missing or misplaced persistence primitives. We devise several building blocks to overcome these challenges. First, we identify the programming challenges of SCM and present a sound programming model that solves them. Then, we tackle memory management, as the first required building block to build a database system, by designing a highly scalable SCM allocator, named PAllocator, that fulfills the versatile needs of database systems. Thereafter, we propose the FPTree, a highly scalable hybrid SCM-DRAM persistent B+-Tree that bridges the gap between the performance of transient and persistent B+-Trees. Using these building blocks, we realize our envisioned database architecture in SOFORT, a hybrid SCM-DRAM columnar transactional engine. We propose an SCM-optimized MVCC scheme that eliminates write-ahead logging from the critical path of transactions. Since SCM -resident data is near-instantly available upon recovery, the new recovery bottleneck is rebuilding DRAM-based data. To alleviate this bottleneck, we propose a novel recovery technique that achieves nearly instant responsiveness of the database by accepting queries right after recovering SCM -based data, while rebuilding DRAM -based data in the background. Additionally, SCM brings new failure scenarios that existing testing tools cannot detect. Hence, we propose an online testing framework that is able to automatically simulate power failures and detect missing or misplaced persistence primitives. Finally, our proposed building blocks can serve to build more complex systems, paving the way for future database systems on SCM

    Selective transparency in distributed transaction processing

    PhD ThesisObject-oriented programming languages provide a powerful interface for programmers to access the mechanisms necessary for reliable distributed computing. Using inheritance and polymorphism provided by the object model, it is possible to develop a hierarchy of classes to capture the semantics and inter-relationships of various levels of functionality required for distributed transaction processing. Using multiple inheritance, application developers can selectively apply transaction properties to suit the requirements of the application objects. In addition to the specific problems of (distributed) transaction processing in an environment of persistent objects, there is a need for a unified framework, or architecture in which to place this system. To be truly effective, not only the transaction manager, but the entire transaction support environment must be described, designed and implemented in terms of objects. This thesis presents an architecture for reliable distributed processing in which the management of persistence, provision of transaction properties (e.g., concurrency control), and organisation of support services (e.g., RPC) are all gathered into a unified design based on the object model.UK Science and Engineering Council: ESPRIT project

    A Multimedia Prototype for Annotation and Illustration Using the Microsoft Foundation Class Library and C++

    One of the major claims of the object-oriented programming approach is that it facilitates the development of complex programs by allowing reuse of components. Most compilers for object-oriented languages are now supplied with class libraries. In addition to those provided with the compilers, there are many others in the public domain or available from commercial suppliers. Code reuse can be maximised through the exploitation of framework class libraries for creating interactive programs. A framework library can be viewed as providing a skeleton application that can be extended and specialised through class inheritance. The evolution of application frameworks is discussed briefly in Chapter 1 with an objective to utilise one of them to develop a prototype multimedia application for annotation and illustration. This prototype is referred to as Glasgow Graphics and Sound (GGS) in this thesis. GGS deals with externally created vector or bitmap images, graphics primitives and sound objects in any sequence. GGS is designed to provide the end-users with facilities to work on external images with free-hand curves and other graphics tools, record their voice, save everything in one disk file and animate them later, if necessary. GGS has the responsibility to store different objects without knowing in advance the sequence of object types the user will create. The implementation language, C++ does not have any built-in support for object persistence. Hence, a number of techniques and strategies for adding persistence to C++ objects are reviewed in Chapter 2. The Microsoft Foundation Class (MFC) library is selected as the application framework for developing GGS and the serialization mechanism in MFC is chosen to deal with the object persistence issues. Some of the techniques for persistence, discussed in Chapter 2, are powerful but incur unacceptable overheads for lightweight applications. On the other hand, the MFC serialization is found very useful in creating transportable stream of bytes that can be stored in a file and sent away as an e-mail attachment. Chapter 3 presents the serialization internals in MFC and uncovers some undocumented details that are believed to be valuable for other MFC users. From an application programmer's viewpoint, it is straightforward to use the MFC serialization in most cases. However, the actual implementation details are complex. A sample data structure is serialized and analysed step-by-step to explain the MFC serialization mechanism. The user-friendliness of applications comes not only from an iconic user interface but also from a uniform user interface across applications. Some common user interface elements and their importance are discussed in Chapter 4 along with the document/ view architecture in MFC that separates an application's data management code from its user interface code. The multiple document interface (MDI) in GGS is based on this document/view architecture. A case study walkthrough is presented, purely from an end-user's viewpoint, to illustrate a simple use of GGS. The main classes and their hierarchy are drafted in Chapter 4 based on a high-level decomposition of GGS. Chapter 5 presents the final class hierarchy, different drawing operations and other features involving graphics primitives. Template based type-safe collection classes are used in GGS to store pointers to objects of any type. This simplifies the interaction with the document class. Basic drawing operations such as moving, deleting and highlighting graphics primitives on the screen use an efficient raster drawing mode. The implementation of view magnification together with the standard scrolling capabilities in a window is discussed that requires some special techniques. The benefits of trapping some uncommon messages from the operating system are also discussed. Chapter 5 ends with an overview of the printing process and a description of the multi-page printing features in GGS. Chapter 6 starts with a general discussion on bitmaps and metafiles. A bitmap is a complete digital representation of a picture. Each pixel in the image corresponds to one or more bits in the bitmap. A metafile, on the other hand, stores pictorial information as a series of records that correspond directly to the graphics device interface (GDI) calls. GGS can import externally created bitmaps and metafiles and treat them like any other graphic or sound objects. All commercial illustration programs do something similar. However, the motivation for developing GGS is slightly different. GGS allows the users to construct and manipulate a fairly complex picture, adding comments as they go. The process of constructing the picture is saved, not just the final picture. Sound can be an effective form of information and interface enhancement when appropriately used. It can serve purposes other than the transmission of details or factual information

    Virtual files: a Framework for Experimental Design

    The increasing power and decreasing cost of computers has resulted in them being applied in an ever widening area. In the world of Computer Aided Design it is now practicable to involve the machine in the earlier stages where a design is still speculative, as well as in the later stages where the computer's calculating ability becomes paramount. Research on database systems has not followed this trend, concentrating instead on commercial applications, with the result that there are very few systems targeted at the early stages of the design process. In this thesis we consider the design and implementation of the file manager for such a system, first of all from the point of view of a single designer working on an entire design, and then from the point of view of a team of designers, each working on a separate aspect of a design. We consider the functionality required of the type of system we are proposing, defining the terminology of experiments to describe it. Having ascertained our requirements we survey current database technology in order to determine to what extent it meets our requirements. We consider traditional concurrency control methods and conclude that they are incompatible with our requirements. We consider current data models and conclude that, with the exception of the persistent programming model, they are not appropriate in the context required, while the implementation of the persistent programming model provides transactions on data structures but not experiments. The implementation of experiments is considered. We examine a number of potential methods, deciding on differential files as the one most likely both to meet our requirements and to have the lowest overheads. Measurements conducted on both a preliminary and a full-scale implementation confirm that this is the case. There are, nevertheless, further gains in convenience and performance to be obtained by exploiting the capabilities of the hardware to the full; we discuss these in relation to virtual memory systems, with particular reference to the VAX/VMS environment. Turning to the case where several designers are each working on a (nearly) distinct part of a design, we consider how to detect conflicts between experiments. Basing our approach on optimistic concurrency control methods, we show how read and write sets may be used to determine those areas of the database where conflicts might arise. As an aside, we show how the methods we propose can be used in an alternative approach to optimistic concurrency control, giving a reduction in system overheads for certain applications. We consider implementation techniques, concluding that a differential files approach has significant advantages in maintaining write sets, while a two-level bitmap may be used to maintain read sets efficiently

    Attribute-Level Versioning: A Relational Mechanism for Version Storage and Retrieval

    Data analysts today have at their disposal a seemingly endless supply of data and repositories hence, datasets from which to draw. New datasets become available daily thus making the choice of which dataset to use difficult. Furthermore, traditional data analysis has been conducted using structured data repositories such as relational database management systems (RDBMS). These systems, by their nature and design, prohibit duplication for indexed collections forcing analysts to choose one value for each of the available attributes for an item in the collection. Often analysts discover two or more datasets with information about the same entity. When combining this data and transforming it into a form that is usable in an RDBMS, analysts are forced to deconflict the collisions and choose a single value for each duplicated attribute containing differing values. This deconfliction is the source of a considerable amount of guesswork and speculation on the part of the analyst in the absence of professional intuition. One must consider what is lost by discarding those alternative values. Are there relationships between the conflicting datasets that have meaning? Is each dataset presenting a different and valid view of the entity or are the alternate values erroneous? If so, which values are erroneous? Is there a historical significance of the variances? The analysis of modern datasets requires the use of specialized algorithms and storage and retrieval mechanisms to identify, deconflict, and assimilate variances of attributes for each entity encountered. These variances, or versions of attribute values, contribute meaning to the evolution and analysis of the entity and its relationship to other entities. A new, distinct storage and retrieval mechanism will enable analysts to efficiently store, analyze, and retrieve the attribute versions without unnecessary complexity or additional alterations of the original or derived dataset schemas. This paper presents technologies and innovations that assist data analysts in discovering meaning within their data and preserving all of the original data for every entity in the RDBMS

    Virtual memory management for database systems

