747 research outputs found

    Producing approximate answers to database queries

    Get PDF
    We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. The exact answer is produced if all of the needed data are available and query processing is allowed to continue until completion. The monotone query processing algorithm of APPROXIMATE works within the standard relational algebra framework and can be implemented on a relational database system with little change to the relational architecture. We describe here the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations of both set-valued and single-valued queries. We show how APPROXIMATE is implemented to make effective use of semantic information, provided by an object-oriented view of the database, and describe the additional overhead required by APPROXIMATE

    Monotonically improving approximate answers to relational algebra queries

    Get PDF
    We present here a query processing method that produces approximate answers to queries posed in standard relational algebra. This method is monotone in the sense that the accuracy of the approximate result improves with the amount of time spent producing the result. This strategy enables us to trade the time to produce the result for the accuracy of the result. An approximate relational model that characterizes appromimate relations and a partial order for comparing them is developed. Relational operators which operate on and return approximate relations are defined

    Measuring and Managing Answer Quality for Online Data-Intensive Services

    Full text link
    Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the quality of answers. Ubora randomly samples online queries and executes them twice. The first execution elides data from slow components and provides fast online answers; the second execution waits for all components to complete. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the EasyRec Recommendation Engine, and the OpenEphyra question answering system. Ubora computes answer quality much faster than competing approaches that do not use memoization. With Ubora, we show that answer quality can and should be used to guide online admission control. Our adaptive controller processed 37% more queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor

    A Survey on the Evolution of Stream Processing Systems

    Full text link
    Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'18) streaming systems, and discuss recent trends and open problems.Comment: 34 pages, 15 figures, 5 table

    BlinkDB: queries with bounded errors and bounded response times on very large data

    Get PDF
    In this paper, we present BlinkDB, a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query's accuracy or response time requirements. We evaluate BlinkDB against the well-known TPC-H benchmarks and a real-world analytic workload derived from Conviva Inc., a company that manages video distribution over the Internet. Our experiments on a 100 node cluster show that BlinkDB can answer queries on up to 17 TBs of data in less than 2 seconds (over 200 x faster than Hive), within an error of 2-10%.National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331)

    The Sixth Annual Workshop on Space Operations Applications and Research (SOAR 1992)

    Get PDF
    This document contains papers presented at the Space Operations, Applications, and Research Symposium (SOAR) hosted by the U.S. Air Force (USAF) on 4-6 Aug. 1992 and held at the JSC Gilruth Recreation Center. The symposium was cosponsored by the Air Force Material Command and by NASA/JSC. Key technical areas covered during the symposium were robotic and telepresence, automation and intelligent systems, human factors, life sciences, and space maintenance and servicing. The SOAR differed from most other conferences in that it was concerned with Government-sponsored research and development relevant to aerospace operations. The symposium's proceedings include papers covering various disciplines presented by experts from NASA, the USAF, universities, and industry

    An Incremental Anytime Algorithm for Multi-Objective Query Optimization

    Get PDF
    Query plans offer diverse tradeoffs between conflicting cost metrics such as execution time, energy consumption, or execution fees in a multi-objective scenario. It is convenient for users to choose the desired cost tradeoff in an interactive process, dynamically adding constraints and finally selecting the best plan based on a continuously refined visualization of optimal cost tradeoffs. Multi-objective query optimization (MOQO) algorithms must possess specific properties to support such an interactive process: First, they must be anytime algorithms, generating multiple result plan sets of increasing quality with low latency between consecutive results. Second, they must be incremental, meaning that they avoid regenerating query plans when being invoked several times for the same query but with slightly different user constraints. We present an incremental anytime algorithm for MOQO, analyze its complexity and show that it offers an attractive tradeoff between result update frequency, single invocation time complexity, and amortized time over multiple invocations. Those properties make it suitable to be used within an interactive query optimization process. We evaluate the algorithm in comparison with prior work on TPC-H queries; our implementation is based on the Postgres database management system

    APPROXIMATE: A Query Processor That Produces Monotonically Improving Approximate Answers

    No full text
    145 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1993.For some applications, it may be better for a database to produce an approximate answer when it is not possible to produce an exact answer. We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. APPROXIMATE returns the exact answer if all of the needed data are available and if there is enough time to continue with the processing. The latest, best available approximate answer is returned if the user demands an answer before query processing is completed. The approximate query processing algorithm of APPROXIMATE works within a standard relational algebra framework.In this thesis, we present the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations to set-valued and single-valued queries, and the approximate query processing algorithm used by APPROXIMATE. APPROXIMATE uses an object-oriented view of the database. This view provides semantic information about the initial approximations to a query and information about which segments of the database can be retrieved. A distance function is defined to provide additional semantic support needed to compare the accuracies of approximate answers to the exact answer and information about the convergence of approximate answers. We describe the overhead incurred by this semantic support and present an implementation of APPROXIMATE.U of I OnlyRestricted to the U of I community idenfinitely during batch ingest of legacy ETD
    corecore