12 research outputs found

    Scrambling Query Plans to Cope With Unexpected Delays

    Get PDF
    Accessing numerous widely-distributed data sources poses significant new challenges for query optimization and execution. Congestion or failure in the network introduce highly-variable response times for wide-area data access. This paper is an initial exploration of solutions to this variability. We investigate a class of dynamic, run-time query plan modification techniques that we call query plan scrambling. We present an algorithm which modifies execution plans on-the-fly in response to unexpected delays in data access. The algorithm both reschedules operators and introduces new operators into the plan. We present simulation results that show how our technique effectively hides delays in receiving the initial requested tuples from remote data sources. (Also cross-referenced as UMIACS-TR-96-35

    Robust Query Optimization Methods With Respect to Estimation Errors: A Survey

    Get PDF
    International audienceThe quality of a query execution plan chosen by a Cost-Based Optimizer (CBO) depends greatly on the estimation accuracy of input parameter values. Many research results have been produced on improving the estimation accuracy, but they do not work for every situation. Therefore, "robust query optimization" was introduced, in an effort to minimize the sub-optimality risk by accepting the fact that estimates could be inaccurate. In this survey, we aim to provide an overview of robust query optimization methods by classifying them into different categories, explaining the essential ideas, listing their advantages and limitations, and comparing them with multiple criteria

    Evolutionary techniques for updating query cost models in a dynamic multidatabase environment

    Full text link
    Deriving local cost models for query optimization in a dynamic multidatabase system (MDBS) is a challenging issue. In this paper, we study how to evolve a query cost model to capture a slowly-changing dynamic MDBS environment so that the cost model is kept up-to-date all the time. Two novel evolutionary techniques, i.e., the shifting method and the block-moving method, are proposed. The former updates a cost model by taking up-to-date information from a new sample query into consideration at each step, while the latter considers a block (batch) of new sample queries at each step. The relevant issues, including derivation of recurrence updating formulas, development of efficient algorithms, analysis and comparison of complexities, and design of an integrated scheme to apply the two methods adaptively, are studied. Our theoretical and experimental results demonstrate that the proposed techniques are quite promising in maintaining accurate cost models efficiently for a slowly changing dynamic MDBS environment. Besides the application to MDBSs, the proposed techniques can also be applied to the automatic maintenance of cost models in self-managing database systems.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/47868/1/778_2003_Article_110.pd

    Query Scrambling for Bursty Data Arrival.

    Get PDF
    Distributed databases operating over wide-area networks, such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources may vary widely due to network congestion, link failure, and other problems. In this paper we examine a new class of methods, called query scrambling, for dealing with unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. We explore various choices in the implementation of these methods and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and it focuses on bursty arrivals of data. We identify and study a number of the basic trade-offs that arise when designing scrambling policies for the bursty environment. Our performance results show that query scrambling is effective in hiding the impact of delays on query response time for a number of different delay scenarios. (Also cross-referenced as UMIACS-TR-96-84

    Dynamic Query Operator Scheduling for Wide-Area Remote Access

    Get PDF
    Distributed databases operating over wide-area networks such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources can vary widely due to network congestion, link failure, and other problems. In such an unpredictable environment, the traditional iterator-based query execution model performs poorly. We have developed a class of methods, called query scrambling, for dealing explicitly with the problem of unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. In this paper we focus on the dynamic scheduling of query operators in the context of query scrambling. We explore various choices for dynamic scheduling and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and delayed or possibly bursty arrivals of data. Our performance results show that scrambling rescheduling is effective in hiding the impact of delays on query response time for a number of different delay scenarios

    A lightweight multi-database execution engine

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (leaf [93]).by Ricardo S. Ambrose.S.M

    Online Integration of Semistructured Data

    Get PDF
    Data integration systems play an important role in the development of distributed multi-database systems. Data integration collects data from heterogeneous and distributed sources, and provides a global view of data to the users. Systems need to process user\u27s applications in the shortest possible time. The virtualization approach to data integration systems ensures that the answers to user requests are the most up-to-date ones. In contrast, the materialization approach reduces data transmission time at the expense of data consistency between the central and remote sites. The virtualization approach to data integration systems can be applied in either batch or online mode. Batch processing requires all data to be available at a central site before processing is started. Delays in transmission of data over a network contribute to a longer processing time. On the other hand, in an online processing mode data integration is performed piece-by-piece as soon as a unit of data is available at the central site. An online processing mode presents the partial results to the users earlier. Due to the heterogeneity of data models at the remote sites, a semistructured global view of data is required. The performance of data integration systems depends on an appropriate data model and the appropriate data integration algorithms used. This thesis presents a new algorithm for immediate processing of data collected from remote and autonomous database systems. The algorithm utilizes the idle processing states while the central site waits for completion of data transmission to produce instant partial results. A decomposition strategy included in the algorithm balances of the computations between the central and remote sites to force maximum resource utilization at both sites. The thesis chooses the XML data model for the representation of semistructured data, and presents a new formalization of the XML data model together with a set of algebraic operations. The XML data model is used to provide a virtual global view of semistructured data. The algebraic operators are consistent with operations of relational algebra, such that any existing syntax based query optimization technique developed for the relational model of data can be directly applied. The thesis shows how to optimize online processing by generating one online integration plan for several data increments. Further, the thesis shows how each independent increment expression can be processed in a parallel mode on a multi core processor system. The dynamic scheduling system proposed in the thesis is able to defer or terminate a plan such that materialization updates and unnecessary computations are minimized. The thesis shows that processing data chunks of fragmented XML documents allows for data integration in a shorter period of time. Finally, the thesis provides a clear formalization of the semistructured data model, a set of algorithms with high-level descriptions, and running examples. These formal backgrounds show that the proposed algorithms are implementable

    Federated Query Processing for the Semantic Web

    Get PDF
    The recent years have witnessed a constant growth in the amount of RDF data available on the Web. This growth is largely based on the increasing rate of data publication on the Web by different actors such governments, life science researchers or geographical institutes. RDF data generation is mainly done by converting already existing legacy data resources into RDF (e.g. converting data stored in relational databases into RDF), but also by creating that RDF data directly (e.g. sensors). These RDF data are normally exposed by means of Linked Data-enabled URIs and SPARQL endpoints. Given the sustained growth that we are experiencing in the number of SPARQL endpoints available, the need to be able to send federated SPARQL queries across them has also grown. Tools for accessing sets of RDF data repositories are starting to appear, differing between them on the way in which they allow users to access these data (allowing users to specify directly what RDF data set they want to query, or making this process transparent to them). To overcome this heterogeneity in federated query processing solutions, the W3C SPARQL working group is defining a federation extension for SPARQL 1.1, which allows combining in a single query, graph patterns that can be evaluated in several endpoints. In this PhD thesis, we describe the syntax of that SPARQL extension for providing access to distributed RDF data sets and formalise its semantics. We adapt existing techniques for distributed data access in relational databases in order to deal with SPARQL endpoints, which we have implemented in our federation query evaluation system (SPARQL-DQP). We describe the static optimisation techniques that we implemented in our system and we carry out a series of experiments that show that our optimisations significantly speed up the query evaluation process in presence of large query results and optional operator
    corecore