27,283 research outputs found

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    Run Time Approximation of Non-blocking Service Rates for Streaming Systems

    Full text link
    Stream processing is a compute paradigm that promises safe and efficient parallelism. Modern big-data problems are often well suited for stream processing's throughput-oriented nature. Realization of efficient stream processing requires monitoring and optimization of multiple communications links. Most techniques to optimize these links use queueing network models or network flow models, which require some idea of the actual execution rate of each independent compute kernel within the system. What we want to know is how fast can each kernel process data independent of other communicating kernels. This is known as the "service rate" of the kernel within the queueing literature. Current approaches to divining service rates are static. Modern workloads, however, are often dynamic. Shared cloud systems also present applications with highly dynamic execution environments (multiple users, hardware migration, etc.). It is therefore desirable to continuously re-tune an application during run time (online) in response to changing conditions. Our approach enables online service rate monitoring under most conditions, obviating the need for reliance on steady state predictions for what are probably non-steady state phenomena. First, some of the difficulties associated with online service rate determination are examined. Second, the algorithm to approximate the online non-blocking service rate is described. Lastly, the algorithm is implemented within the open source RaftLib framework for validation using a simple microbenchmark as well as two full streaming applications.Comment: technical repor

    Performance Testing of Distributed Component Architectures

    Get PDF
    Performance characteristics, such as response time, throughput andscalability, are key quality attributes of distributed applications. Current practice,however, rarely applies systematic techniques to evaluate performance characteristics.We argue that evaluation of performance is particularly crucial in early developmentstages, when important architectural choices are made. At first glance, thiscontradicts the use of testing techniques, which are usually applied towards the endof a project. In this chapter, we assume that many distributed systems are builtwith middleware technologies, such as the Java 2 Enterprise Edition (J2EE) or theCommon Object Request Broker Architecture (CORBA). These provide servicesand facilities whose implementations are available when architectures are defined.We also note that it is the middleware functionality, such as transaction and persistenceservices, remote communication primitives and threading policy primitives,that dominates distributed system performance. Drawing on these observations, thischapter presents a novel approach to performance testing of distributed applications.We propose to derive application-specific test cases from architecture designs so thatthe performance of a distributed application can be tested based on the middlewaresoftware at early stages of a development process. We report empirical results thatsupport the viability of the approach
    corecore