11,808 research outputs found

    Communication Facilities for Distributed TransactionProcessing Systems

    Get PDF
    istributed transaction-processing systems must manage such functions as concurrency, recovery, and replication. One way to improve their efficiency and reliability is to increase software modularity, which means the separate components should execute in separate address spaces to permit hardware-enforced separation. This structure offers advantages but demands efficient interprocess communication (IPC) services. In our research at Purdue University, we are investigating mechanisms and paradigms for efficient communication support in conventional architectures, such as virtual-memory, single-processor machines with no special IPC hardware support. (Some mainframes have hardware assistance where more than one address space can be accessed at the same time.) We are studying communication designs in the context of the Raid system, a robust and adaptable distributed database system for transaction processing.' Raid has been developed at Purdue on Sun workstations under the Unix operating svstem in a local area network. Communication software is critical in distributed computing systems. This research identifies efficient mechanisms and paradigms for distributed transaction processing in a replicated database environment. In Raid, each major logical component is implemented as a server, which is a process in a separate address space. Servers interact with other processes through a high-level communication subsystem. Currently, Raid has six servers for transaction management: the user interface (UI). the action driver (AD), the access manager (AM), the atomicity controller (AC), the concurrency controller (CC), and the replication controller (RC). High-level name service is provided by a separate server, the oracle. Raid's communication software, called Raidcomm, has evolved as a result of the knowledge we gained from other systems and from our own experiments, which are summarized in the following sections. The first version, Raidcomm V.l, was developed in 1986. Implemented on top of the SunOS socket-based IPC mechanism using UDP/IP (User Datagram Protocol/Internet Protocol), it provides a clean, location-independent interface between the servers.' To permit defining server interfaces in terms of arbitrary data structures, we used Sun's external data representation standard, XDR. We developed Raidcomm V.2 in 1990 to provide multicasting support for the AC and RC servers. We designed Raidcomm V.3 t

    On-Disk Data Processing: Issues and Future Directions

    Get PDF
    In this paper, we present a survey of "on-disk" data processing (ODDP). ODDP, which is a form of near-data processing, refers to the computing arrangement where the secondary storage drives have the data processing capability. Proposed ODDP schemes vary widely in terms of the data processing capability, target applications, architecture and the kind of storage drive employed. Some ODDP schemes provide only a specific but heavily used operation like sort whereas some provide a full range of operations. Recently, with the advent of Solid State Drives, powerful and extensive ODDP solutions have been proposed. In this paper, we present a thorough review of architectures developed for different on-disk processing approaches along with current and future challenges and also identify the future directions which ODDP can take.Comment: 24 pages, 17 Figures, 3 Table

    The CDF Data Handling System

    Full text link
    The Collider Detector at Fermilab (CDF) records proton-antiproton collisions at center of mass energy of 2.0 TeV at the Tevatron collider. A new collider run, Run II, of the Tevatron started in April 2001. Increased luminosity will result in about 1~PB of data recorded on tapes in the next two years. Currently the CDF experiment has about 260 TB of data stored on tapes. This amount includes raw and reconstructed data and their derivatives. The data storage and retrieval are managed by the CDF Data Handling (DH) system. This system has been designed to accommodate the increased demands of the Run II environment and has proven robust and reliable in providing reliable flow of data from the detector to the end user. This paper gives an overview of the CDF Run II Data Handling system which has evolved significantly over the course of this year. An outline of the future direction of the system is given.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 7 pages, LaTeX, 4 EPS figures, PSN THKT00

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    MongoDB Performance In The Cloud

    Get PDF
    Web applications are growing at a staggering rate every day. As web applications keep getting more complex, their data storage requirements tend to grow exponentially. Databases play an important role in the way web applications store their information. Mongodb is a document store database that does not have strict schemas that RDBMs require and can grow horizontally without performance degradation. MongoDB brings possibilities for different storage scenarios and allow the programmers to use the database as a storage that fits their needs, not the other way around. Scaling MongoDB horizontally requires tens to hundreds of servers, making it very difficult to afford this kind of setup on dedicated hardware. By moving the database into the cloud, this opens up a possibility for low cost virtual machine instances at reasonable prices. There are many cloud services to choose from and without testing performance on each one, there is very little information out there. This paper provides benchmarks on the performance of MongoDB in the cloud
    • …
    corecore