366 research outputs found

    Scrambling Query Plans to Cope With Unexpected Delays

    Get PDF
    Accessing numerous widely-distributed data sources poses significant new challenges for query optimization and execution. Congestion or failure in the network introduce highly-variable response times for wide-area data access. This paper is an initial exploration of solutions to this variability. We investigate a class of dynamic, run-time query plan modification techniques that we call query plan scrambling. We present an algorithm which modifies execution plans on-the-fly in response to unexpected delays in data access. The algorithm both reschedules operators and introduces new operators into the plan. We present simulation results that show how our technique effectively hides delays in receiving the initial requested tuples from remote data sources. (Also cross-referenced as UMIACS-TR-96-35

    Dynamic Query Operator Scheduling for Wide-Area Remote Access

    Get PDF
    Distributed databases operating over wide-area networks such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources can vary widely due to network congestion, link failure, and other problems. In such an unpredictable environment, the traditional iterator-based query execution model performs poorly. We have developed a class of methods, called query scrambling, for dealing explicitly with the problem of unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. In this paper we focus on the dynamic scheduling of query operators in the context of query scrambling. We explore various choices for dynamic scheduling and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and delayed or possibly bursty arrivals of data. Our performance results show that scrambling rescheduling is effective in hiding the impact of delays on query response time for a number of different delay scenarios

    Query Scrambling for Bursty Data Arrival.

    Get PDF
    Distributed databases operating over wide-area networks, such as the Internet, must deal with the unpredictable nature of the performance of communication. The response times of accessing remote sources may vary widely due to network congestion, link failure, and other problems. In this paper we examine a new class of methods, called query scrambling, for dealing with unpredictable response times. Query scrambling dynamically modifies query execution plans on-the-fly in reaction to unexpected delays in data access. We explore various choices in the implementation of these methods and examine, through a detailed simulation, the effects of these choices. Our experimental environment considers pipelined and non-pipelined join processing in a client with multiple remote data sources and it focuses on bursty arrivals of data. We identify and study a number of the basic trade-offs that arise when designing scrambling policies for the bursty environment. Our performance results show that query scrambling is effective in hiding the impact of delays on query response time for a number of different delay scenarios. (Also cross-referenced as UMIACS-TR-96-84

    Robust Query Optimization Methods With Respect to Estimation Errors: A Survey

    Get PDF
    International audienceThe quality of a query execution plan chosen by a Cost-Based Optimizer (CBO) depends greatly on the estimation accuracy of input parameter values. Many research results have been produced on improving the estimation accuracy, but they do not work for every situation. Therefore, "robust query optimization" was introduced, in an effort to minimize the sub-optimality risk by accepting the fact that estimates could be inaccurate. In this survey, we aim to provide an overview of robust query optimization methods by classifying them into different categories, explaining the essential ideas, listing their advantages and limitations, and comparing them with multiple criteria

    Three light-weight execution engines in Java for web data-intensive data source contents : (extended abstract)

    Get PDF
    Title from cover. "March, 1998."Includes bibliographical references (p. 8-9).Ricardo Ambrose ... [et al.]

    Um framework para construção de máquinas de execução de consultas relacionais

    Get PDF
    O acesso integrado á informações publicadas na web permite que sejam oferecidos novos serviços combinando sites, a princípio projetados autonomamente. Aplicações como essa envolvem a execução distribuída de consultas com acesso integrado aos sites web. Este processo, no entanto, pode apresentar problemas sérios de desempenho frente á imprevisibilidade do tempo de acesso a estes sites. Baseado num estudo das características encontradas nos modelos de execução de consultas, desenvolvemos o framework denominado QEEF para construção de máquinas de consulta para diferentes modelos de execução. Para sua validação, foi implementada a máquina de execução AQEE, incorporando o modelo adaptativo adequado á imprevisibilidade do tempo de acesso a sites, e foi realizado um estudo de caso baseado neste modelo. O framework QEEF foi integrado ao ambiente CoDIMS - uma arquitetura flexível para a geração de sistemas configurados para integração de dados.Eje: Bases de DatosRed de Universidades con Carreras en Informática (RedUNCI

    Um framework para construção de máquinas de execução de consultas relacionais

    Get PDF
    O acesso integrado á informações publicadas na web permite que sejam oferecidos novos serviços combinando sites, a princípio projetados autonomamente. Aplicações como essa envolvem a execução distribuída de consultas com acesso integrado aos sites web. Este processo, no entanto, pode apresentar problemas sérios de desempenho frente á imprevisibilidade do tempo de acesso a estes sites. Baseado num estudo das características encontradas nos modelos de execução de consultas, desenvolvemos o framework denominado QEEF para construção de máquinas de consulta para diferentes modelos de execução. Para sua validação, foi implementada a máquina de execução AQEE, incorporando o modelo adaptativo adequado á imprevisibilidade do tempo de acesso a sites, e foi realizado um estudo de caso baseado neste modelo. O framework QEEF foi integrado ao ambiente CoDIMS - uma arquitetura flexível para a geração de sistemas configurados para integração de dados.Eje: Bases de DatosRed de Universidades con Carreras en Informática (RedUNCI

    Opportunistic sensing and mobile data delivery in the CarTel System

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 94-102).Wide-area sensor systems enable a broad class of applications, including the fine-grained monitoring of traffic congestion, road surface conditions, and pollution. This dissertation shows that it is possible to build a low-cost, wide-area sensor system. Our approach relies on two techniques: using existing motion from such sources of mobility as cars and people to provide coverage (opportunistic mobility), and using the abundance of short duration network connections to provide low-cost data delivery (opportunistic networking). We use these two techniques to build a mobile sensor computing system called CarTel, to collect, process, deliver, and visualize spatially diverse data. CarTel consists of three key components: hardware placed in users' cars to provide remote sensing, a communication stack called CafNet to take advantage of opportunistic networking, and a web-based portal for data visualization. This dissertation describes the design and implementation of these three components. In addition, we analyze the properties of opportunistic networking and mobility. To show the viability of opportunistic networking, we studied Internet access from moving vehicles and found that the median duration of link layer connectivity at vehicular speeds was 13 seconds, that the median connection upload bandwidth was 30 KBytes/s, and that the mean duration between successful associations to APs was 75 seconds. To show the viability of opportunistic mobility, we used a simulation and found that after as little as 100 drive hours, a CarTel deployment could achieve over 80 percent coverage of useful roads for a traffic congestion monitoring application.by Bret W. Hull.Ph.D

    Online Integration of Semistructured Data

    Get PDF
    Data integration systems play an important role in the development of distributed multi-database systems. Data integration collects data from heterogeneous and distributed sources, and provides a global view of data to the users. Systems need to process user\u27s applications in the shortest possible time. The virtualization approach to data integration systems ensures that the answers to user requests are the most up-to-date ones. In contrast, the materialization approach reduces data transmission time at the expense of data consistency between the central and remote sites. The virtualization approach to data integration systems can be applied in either batch or online mode. Batch processing requires all data to be available at a central site before processing is started. Delays in transmission of data over a network contribute to a longer processing time. On the other hand, in an online processing mode data integration is performed piece-by-piece as soon as a unit of data is available at the central site. An online processing mode presents the partial results to the users earlier. Due to the heterogeneity of data models at the remote sites, a semistructured global view of data is required. The performance of data integration systems depends on an appropriate data model and the appropriate data integration algorithms used. This thesis presents a new algorithm for immediate processing of data collected from remote and autonomous database systems. The algorithm utilizes the idle processing states while the central site waits for completion of data transmission to produce instant partial results. A decomposition strategy included in the algorithm balances of the computations between the central and remote sites to force maximum resource utilization at both sites. The thesis chooses the XML data model for the representation of semistructured data, and presents a new formalization of the XML data model together with a set of algebraic operations. The XML data model is used to provide a virtual global view of semistructured data. The algebraic operators are consistent with operations of relational algebra, such that any existing syntax based query optimization technique developed for the relational model of data can be directly applied. The thesis shows how to optimize online processing by generating one online integration plan for several data increments. Further, the thesis shows how each independent increment expression can be processed in a parallel mode on a multi core processor system. The dynamic scheduling system proposed in the thesis is able to defer or terminate a plan such that materialization updates and unnecessary computations are minimized. The thesis shows that processing data chunks of fragmented XML documents allows for data integration in a shorter period of time. Finally, the thesis provides a clear formalization of the semistructured data model, a set of algorithms with high-level descriptions, and running examples. These formal backgrounds show that the proposed algorithms are implementable
    corecore