9 research outputs found

    Client-based Logging: A New Paradigm of Distributed Transaction Management

    Full text link
    The proliferation of inexpensive workstations and networks has created a new era in distributed computing. At the same time, non-traditional applications such as computer-aided design (CAD), computer-aided software engineering (CASE), geographic-information systems (GIS), and office-information systems (OIS) have placed increased demands for high-performance transaction processing on database systems. The combination of these factors gives rise to significant challenges in the design of modern database systems. In this thesis, we propose novel techniques whose aim is to improve the performance and scalability of these new database systems. These techniques exploit client resources through client-based transaction management. Client-based transaction management is realized by providing logging facilities locally even when data is shared in a global environment. This thesis presents several recovery algorithms which utilize client disks for storing recovery related information (i.e., log records). Our algorithms work with both coarse and fine-granularity locking and they do not require the merging of client logs at any time. Moreover, our algorithms support fine-granularity locking with multiple clients permitted to concurrently update different portions of the same database page. The database state is recovered correctly when there is a complex crash as well as when the updates performed by different clients on a page are not present on the disk version of the page, even though some of the updating transactions have committed. This thesis also presents the implementation of the proposed algorithms in a memory-mapped storage manager as well as a detailed performance study of these algorithms using the OO1 database benchmark. The performance results show that client-based logging is superior to traditional server-based logging. This is because client-based logging is an effective way to reduce dependencies on server CPU and disk resources and, thus, prevents the server from becoming a performance bottleneck as quickly when the number of clients accessing the database increases

    Dynamic Query Scheduling in Parallel Data Warehouses

    Get PDF
    Data warehouse queries pose challenging performance problems that often necessitate the use of parallel database systems (PDBS). Although dynamic load balancing is of key importance in PDBS, to our knowledge it has not yet been investigated thoroughly for parallel data warehouses. In this study, we propose a scheduling strategy that simultaneously considers both processors and disks while utilizing the load balancing potential of a Shared Disk architecture. We compare the performance of this new method to several other approaches in a comprehensive simulation study, incorporating skew aspects and typical data warehouse features such as star schemas

    Multi-Dimensional Database Allocation for Parallel Data Warehouses

    Get PDF
    Data allocation is a key performance factor for parallel database systems (PDBS). This holds especially for data warehousing environments where huge amounts of data and complex analytical queries have to be dealt with. While there are several studies on data allocation for relational PDBS, the specific requirements of data warehouses have not yet been sufficiently addressed. In this study, we consider the allocation of relational data warehouses based on a star schema and utilizing bitmap index structures. We investigate how a multi-dimensional hierarchical data fragmentation of the fact table supports queries referencing different subsets of the schema dimensions. Our analysis is based on realistic parameters derived from a decision support benchmark. The performance implications of different allocation choices are evaluated by means of a detailed simulation model

    Research Collaboration Influence Analysis Using Dynamic Co-authorship and Citation Networks

    Get PDF
    Collaborative research is increasing in terms of publications, skills, and formal interactions, which certainly makes it the hotspot in both academia and the industrial sector. Knowing the factors and behavior of dynamic collaboration network provides insights that helps in improving the researcher’s profile and coordinator’s productivity of research. Despite rapid developments in the research collaboration process with various outcomes, its validity is still difficult to address. Existing approaches have used bibliometric network analysis with different aspects to understand collaboration patterns that measure the quality of their corresponding relationships. At this point in time, we would like to investigate an efficient method to outline the credibility of findings in publication—author relations. In this research, we propose a new collaboration method to analyze the structure of research articles using four types of graphs for discerning authors’ influence. We apply different combinations of network relationships and bibliometric analysis on the G-index parameter to disclose their interrelated differences. Our model is designed to find the dynamic indicators of co-authored collaboration with an influence on the author’s behavior in terms of change in research area/interest. In the research we investigate the dynamic relations in an academic field using metadata of openly available articles and collaborating international authors in interrelated areas/domains. Based on filtered evidence of relationship networks and their statistical results, the research shows an increment in productivity and better influence over time

    Towards Scalable OLTP Over Fast Networks

    Get PDF
    Online Transaction Processing (OLTP) underpins real-time data processing in many mission-critical applications, from banking to e-commerce. These applications typically issue short-duration, latency-sensitive transactions that demand immediate processing. High-volume applications, such as Alibaba's e-commerce platform, achieve peak transaction rates as high as 70 million transactions per second, exceeding the capacity of a single machine. Instead, distributed OLTP database management systems (DBMS) are deployed across multiple powerful machines. Historically, such distributed OLTP DBMSs have been primarily designed to avoid network communication, a paradigm largely unchanged since the 1980s. However, fast networks challenge the conventional belief that network communication is the main bottleneck. In particular, emerging network technologies, like Remote Direct Memory Access (RDMA), radically alter how data can be accessed over a network. RDMA's primitives allow direct access to the memory of a remote machine within an order of magnitude of local memory access. This development invalidates the notion that network communication is the primary bottleneck. Given that traditional distributed database systems have been designed with the premise that the network is slow, they cannot efficiently exploit these fast network primitives, which requires us to reconsider how we design distributed OLTP systems. This thesis focuses on the challenges RDMA presents and its implications on the design of distributed OLTP systems. First, we examine distributed architectures to understand data access patterns and scalability in modern OLTP systems. Drawing on these insights, we advocate a distributed storage engine optimized for high-speed networks. The storage engine serves as the foundation of a database, ensuring efficient data access through three central components: indexes, synchronization primitives, and buffer management (caching). With the introduction of RDMA, the landscape of data access has undergone a significant transformation. This requires a comprehensive redesign of the storage engine components to exploit the potential of RDMA and similar high-speed network technologies. Thus, as the second contribution, we design RDMA-optimized tree-based indexes — especially applicable for disaggregated databases to access remote data efficiently. We then turn our attention to the unique challenges of RDMA. One-sided RDMA, one of the network primitives introduced by RDMA, presents a performance advantage in enabling remote memory access while bypassing the remote CPU and the operating system. This allows the remote CPU to process transactions uninterrupted, with no requirement to be on hand for network communication. However, that way, specialized one-sided RDMA synchronization primitives are required since traditional CPU-driven primitives are bypassed. We found that existing RDMA one-sided synchronization schemes are unscalable or, even worse, fail to synchronize correctly, leading to hard-to-detect data corruption. As our third contribution, we address this issue by offering guidelines to build scalable and correct one-sided RDMA synchronization primitives. Finally, recognizing that maintaining all data in memory becomes economically unattractive, we propose a distributed buffer manager design that efficiently utilizes cost-effective NVMe flash storage. By leveraging low-latency RDMA messages, our buffer manager provides a transparent memory abstraction, accessing the aggregated DRAM and NVMe storage across nodes. Central to our approach is a distributed caching protocol that dynamically caches data. With this approach, our system can outperform RDMA-enabled in-memory distributed databases while managing larger-than-memory datasets efficiently

    Recovery and coherency-control protocols for fast intersystem page transfer and fine-granularity locking in a shared disks transaction environment

    No full text
    llbstract This paper proposes schemes for fast page transfer between transaction system Instances In a shared disks (SD) environment where all the sharing Instances can read and modify the same data Fast page transfer improves transaction response time and concur-rency because one or more disk I/OS are avoided while transferring a page from a system which modified it to another system which needs it. The proposed methods work with the steal and no-force buffer management policies, and fine-granularity (e.g., record) locking For each of the page-transfer schemes, we present both recovery and coherency-control protocols Updates can be made to a page by several systems before the page is written to disk. Many subtleties Involved in correctly recovering such a page in the face of single system or complex-wide failures are also discussed. Assuming that each system maintains its own log, some methods require a merged log for restart recovery while others don’t Our proposals should also apply to dlstrihuted. recoverable file systems and distributed virtual memory in the SD environment, and to the currently oopular client-server object-oriented DBMS environments where the clients cache data. 1
    corecore