11,808 research outputs found
Communication Facilities for Distributed TransactionProcessing Systems
istributed transaction-processing systems must manage such functions as concurrency, recovery, and replication. One way to improve their efficiency and reliability is to increase software modularity, which means the separate components should execute in separate address spaces to permit hardware-enforced separation. This structure offers advantages but demands efficient interprocess communication (IPC) services. In our research at Purdue University, we are investigating mechanisms and paradigms for efficient communication support in conventional architectures, such as virtual-memory, single-processor machines with no special IPC hardware support. (Some mainframes have hardware assistance where more than one address space can be accessed at the same time.) We are studying communication designs in the context of the Raid system, a robust and adaptable distributed database system for transaction processing.' Raid has been developed at Purdue on Sun workstations under the Unix operating svstem in a local area network. Communication software is critical in distributed computing systems. This research identifies efficient mechanisms and paradigms for distributed transaction processing in a replicated database environment. In Raid, each major logical component is implemented as a server, which is a process in a separate address space. Servers interact with other processes through a high-level communication subsystem. Currently, Raid has six servers for transaction management: the user interface (UI). the action driver (AD), the access manager (AM), the atomicity controller (AC), the concurrency controller (CC), and the replication controller (RC). High-level name service is provided by a separate server, the oracle. Raid's communication software, called Raidcomm, has evolved as a result of the knowledge we gained from other systems and from our own experiments, which are summarized in the following sections. The first version, Raidcomm V.l, was developed in 1986. Implemented on top of the SunOS socket-based IPC mechanism using UDP/IP (User Datagram Protocol/Internet Protocol), it provides a clean, location-independent interface between the servers.' To permit defining server interfaces in terms of arbitrary data structures, we used Sun's external data representation standard, XDR. We developed Raidcomm V.2 in 1990 to provide multicasting support for the AC and RC servers. We designed Raidcomm V.3 t
On-Disk Data Processing: Issues and Future Directions
In this paper, we present a survey of "on-disk" data processing (ODDP). ODDP,
which is a form of near-data processing, refers to the computing arrangement
where the secondary storage drives have the data processing capability.
Proposed ODDP schemes vary widely in terms of the data processing capability,
target applications, architecture and the kind of storage drive employed. Some
ODDP schemes provide only a specific but heavily used operation like sort
whereas some provide a full range of operations. Recently, with the advent of
Solid State Drives, powerful and extensive ODDP solutions have been proposed.
In this paper, we present a thorough review of architectures developed for
different on-disk processing approaches along with current and future
challenges and also identify the future directions which ODDP can take.Comment: 24 pages, 17 Figures, 3 Table
The CDF Data Handling System
The Collider Detector at Fermilab (CDF) records proton-antiproton collisions
at center of mass energy of 2.0 TeV at the Tevatron collider. A new collider
run, Run II, of the Tevatron started in April 2001. Increased luminosity will
result in about 1~PB of data recorded on tapes in the next two years. Currently
the CDF experiment has about 260 TB of data stored on tapes. This amount
includes raw and reconstructed data and their derivatives.
The data storage and retrieval are managed by the CDF Data Handling (DH)
system. This system has been designed to accommodate the increased demands of
the Run II environment and has proven robust and reliable in providing reliable
flow of data from the detector to the end user. This paper gives an overview of
the CDF Run II Data Handling system which has evolved significantly over the
course of this year. An outline of the future direction of the system is given.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 7 pages, LaTeX, 4 EPS figures, PSN
THKT00
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
MongoDB Performance In The Cloud
Web applications are growing at a staggering rate every day. As web applications keep getting more complex, their data storage requirements tend to grow exponentially. Databases play an important role in the way web applications store their information. Mongodb is a document store database that does not have strict schemas that RDBMs require and can grow horizontally without performance degradation. MongoDB brings possibilities for different storage scenarios and allow the programmers to use the database as a storage that fits their needs, not the other way around. Scaling MongoDB horizontally requires tens to hundreds of servers, making it very difficult to afford this kind of setup on dedicated hardware. By moving the database into the cloud, this opens up a possibility for low cost virtual machine instances at reasonable prices. There are many cloud services to choose from and without testing performance on each one, there is very little information out there. This paper provides benchmarks on the performance of MongoDB in the cloud
- …