894 research outputs found
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
Design and Implementation of a Distributed Middleware for Parallel Execution of Legacy Enterprise Applications
A typical enterprise uses a local area network of computers to perform its
business. During the off-working hours, the computational capacities of these
networked computers are underused or unused. In order to utilize this
computational capacity an application has to be recoded to exploit concurrency
inherent in a computation which is clearly not possible for legacy applications
without any source code. This thesis presents the design an implementation of a
distributed middleware which can automatically execute a legacy application on
multiple networked computers by parallelizing it. This middleware runs multiple
copies of the binary executable code in parallel on different hosts in the
network. It wraps up the binary executable code of the legacy application in
order to capture the kernel level data access system calls and perform them
distributively over multiple computers in a safe and conflict free manner. The
middleware also incorporates a dynamic scheduling technique to execute the
target application in minimum time by scavenging the available CPU cycles of
the hosts in the network. This dynamic scheduling also supports the CPU
availability of the hosts to change over time and properly reschedule the
replicas performing the computation to minimize the execution time. A prototype
implementation of this middleware has been developed as a proof of concept of
the design. This implementation has been evaluated with a few typical case
studies and the test results confirm that the middleware works as expected
Review of Some Transaction Models used in Mobile Databases
Mobile computing is presently experiencing a period of unprecedented growth with the convergence of communication and computing capabilities of mobile phones and personal digital assistant. However, mobile computing presents many inherent problems that lead to poor network connectivity. To overcome poor connectivity and reduce cost, mobile clients are forced to operate in disconnected and partially connected modes. One of the main goals of mobile data access is to reach the ubiquity inherent to the mobile systems: to access information regardless of time and place. Due to mobile systems restrictions such as, for instance, limited memory and narrow bandwidth, it is only natural that researchers expend efforts to soothe such issues. This work approaches the issues regarding the cache management in mobile databases, with emphasis in techniques to reduce cache faults while the mobile device is either connected, or with a narrow bandwidth, or disconnected at all. Thus, it is expected improve data availability while a disconnection. Here in the paper, we try to describe various mobile transaction models, focusing on versatile data sharing mechanisms in volatile mobile environments
Review of Some Transaction Models used in Mobile Databases
Mobile computing is presently experiencing a period of unprecedented growth with the convergence of communication and computing capabilities of mobile phones and personal digital assistant. However, mobile computing presents many inherent problems that lead to poor network connectivity. To overcome poor connectivity and reduce cost, mobile clients are forced to operate in disconnected and partially connected modes. One of the main goals of mobile data access is to reach the ubiquity inherent to the mobile systems: to access information regardless of time and place. Due to mobile systems restrictions such as, for instance, limited memory and narrow bandwidth, it is only natural that researchers expend efforts to soothe such issues. This work approaches the issues regarding the cache management in mobile databases, with emphasis in techniques to reduce cache faults while the mobile device is either connected, or with a narrow bandwidth, or disconnected at all. Thus, it is expected improve data availability while a disconnection. Here in the paper, we try to describe various mobile transaction models, focusing on versatile data sharing mechanisms in volatile mobile environments
A Tour of Gallifrey, a Language for Geodistributed Programming
Programming efficient distributed, concurrent systems requires new abstractions that go beyond traditional sequential programming. But programmers already have trouble getting sequential code right, so simplicity is essential. The core problem is that low-latency, high-availability access to data requires replication of mutable state. Keeping replicas fully consistent is expensive, so the question is how to expose asynchronously replicated objects to programmers in a way that allows them to reason simply about their code. We propose an answer to this question in our ongoing work designing a new language, Gallifrey, which provides orthogonal replication through _restrictions_ with _merge strategies_, _contingencies_ for conflicts arising from concurrency, and _branches_, a novel concurrency control construct inspired by version control, to contain provisional behavior
Khazana: a flexible wide area data store
technical reportKhazana is a peer-to-peer data service that supports efficient sharing and aggressive caching of mutable data across the wide area while giving clients significant control over replica divergence. Previous work on wide-area replicated services focussed on at most two of the following three properties: aggressive replication, customizable consistency, and generality. In contrast, Khazana provides scalable support for large numbers of replicas while giving applications considerable flexibility in trading off consistency for availability and performance. Its flexibility enables applications to effectively exploit inherent data locality while meeting consistency needs. Khazana exports a file system-like interface with a small set of consistency controls which can be combined to yield a broad spectrum of consistency flavors ranging from strong consistency to best-effort eventual consistency. Khazana servers form failure-resilient dynamic replica hierarchies to manage replicas across variable quality network links. In this report, we outline Khazana?s design and show how its flexibility enables three diverse network services built on top of it to meet their individual consistency and performance needs: (i) a wide-area replicated file system that supports serializable writes as well as traditional file sharing across wide area, (ii) an enterprise data service that exploits locality by caching enterprise data closer to end-users while ensuring strong consistency for data integrity, and (iii) a replicated database that reaps order of magnitude gains in throughput by relaxing consistency
Quality measures for ETL processes: from goals to implementation
Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft
- …