19 research outputs found

    Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution

    Full text link
    Cloud-based data analysis is nowadays common practice because of the lower system management overhead as well as the pay-as-you-go pricing model. The pricing model, however, is not always suitable for query processing as heavy use results in high costs. For example, in query-as-a-service systems, where users are charged per processed byte, collections of queries accessing the same data frequently can become expensive. The problem is compounded by the limited options for the user to optimize query execution when using declarative interfaces such as SQL. In this paper, we show how, without modifying existing systems and without the involvement of the cloud provider, it is possible to significantly reduce the overhead, and hence the cost, of query-as-a-service systems. Our approach is based on query rewriting so that multiple concurrent queries are combined into a single query. Our experiments show the aggregated amount of work done by the shared execution is smaller than in a query-at-a-time approach. Since queries are charged per byte processed, the cost of executing a group of queries is often the same as executing a single one of them. As an example, we demonstrate how the shared execution of the TPC-H benchmark is up to 100x and 16x cheaper in Amazon Athena and Google BigQuery than using a query-at-a-time approach while achieving a higher throughput

    Flight Profiling System

    Get PDF
    Projecte final de carrera fet en col.laboració amb l'empresa Amadeus IT GroupEnglish: The main goal of the project is to develop a dynamic flight target detector from scratch finally called PaxFinder (Passengers Finder). This goal has been evolving during the course of the internship. At the beginning, the objective was far more a generic one: “The objective of the internship is to study the public data available from web-based social networks so Amadeus products can be tailored to fit each individual”. The first part was clear. The public data of the people profiles coming from the social networks should be studied in order to be used in this project. The second part was specified to develop a dynamic flight target detector through the analysis of passengers profiles. Some requirements and limitations were established at the beginning of the project in order to define and limit the scope of the project. A first requirement was to integrate Crescando into the project in order to exploit its capacities. A second one was to only work with profile data coming from social networks and Amadeus repositories. For example, information like the travel history and the background of a traveler was forbidden. Paying attention to the goal and the requirements of the project, the way to do this dynamic flight target detector was first developing a flight profiling system and then, a profile search system. The tasks that the project should go through in order to reach the objectives were the following: · Studying which social networks contain data relevant for the planned usage. · Studying how data can be extracted and building tool to perform the extractions even from social networks as Amadeus databases. · Getting familiar with Crescando. · Participate in the definition of the model of data and creating a new Crescando database thanks to the existing administration tools. · Define the parameters of a profile. · Define a methodology to classify and compare profiles. · Develop PaxFinder. Any target detector has to be correctly tested. This process is composed of three steps. The first one is an offline testing. The second one is a study with real users and the third one is to make an online version massively working with real users data.In this project we have done the offline testing. This testing would have be done with real data coming from Amadeus repositories and social network extractions. However, some problems related to the data consistency into Amadeus repositories and the social networks privacy policy did manipulate the data stored in Crescando. Anyway, the study of the most important social networks has been done, as well as their respective data extraction tools. The documentation of this work can be read in the annexes attached at the end of this document

    10381 Summary and Abstracts Collection -- Robust Query Processing

    Get PDF
    Dagstuhl seminar 10381 on robust query processing (held 19.09.10 - 24.09.10) brought together a diverse set of researchers and practitioners with a broad range of expertise for the purpose of fostering discussion and collaboration regarding causes, opportunities, and solutions for achieving robust query processing. The seminar strove to build a unified view across the loosely-coupled system components responsible for the various stages of database query processing. Participants were chosen for their experience with database query processing and, where possible, their prior work in academic research or in product development towards robustness in database query processing. In order to pave the way to motivate, measure, and protect future advances in robust query processing, seminar 10381 focused on developing tests for measuring the robustness of query processing. In these proceedings, we first review the seminar topics, goals, and results, then present abstracts or notes of some of the seminar break-out sessions. We also include, as an appendix, the robust query processing reading list that was collected and distributed to participants before the seminar began, as well as summaries of a few of those papers that were contributed by some participants

    From Cooperative Scans to Predictive Buffer Management

    Get PDF
    In analytical applications, database systems often need to sustain workloads with multiple concurrent scans hitting the same table. The Cooperative Scans (CScans) framework, which introduces an Active Buffer Manager (ABM) component into the database architecture, has been the most effective and elaborate response to this problem, and was initially developed in the X100 research prototype. We now report on the the experiences of integrating Cooperative Scans into its industrial-strength successor, the Vectorwise database product. During this implementation we invented a simpler optimization of concurrent scan buffer management, called Predictive Buffer Management (PBM). PBM is based on the observation that in a workload with long-running scans, the buffer manager has quite a bit of information on the workload in the immediate future, such that an approximation of the ideal OPT algorithm becomes feasible. In the evaluation on both synthetic benchmarks as well as a TPC-H throughput run we compare the benefits of naive buffer management (LRU) versus CScans, PBM and OPT; showing that PBM achieves benefits close to Cooperative Scans, while incurring much lower architectural impact.Comment: VLDB201

    Predictable performance and high query concurrency for data analytics

    Get PDF
    Conventional data warehouses employ the query- at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention and thrashing, because the physical plans—unaware of each other—compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly and can be highly erratic when multiple complex queries run at the same time. We present in this paper Cjoin , a new design that substantially improves throughput in large-scale data analytics systems processing many concurrent join queries. In contrast to the conventional query-at-a-time model, our approach employs a single physical plan that shares I/O, computation, and tuple storage across all in-flight join queries. We use an “always on” pipeline of non-blocking operators, managed by a controller that continuously examines the current query mix and optimizes the pipeline on the fly. Our design enables data analytics engines to scale gracefully to large data sets, provide predictable execution times, and reduce contention. We implemented Cjoin as an extension to the PostgreSQL DBMS. This prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries

    In-Memory-Datenmanagement in betrieblichen Anwendungssystemen

    Get PDF
    In-Memory-Datenbanken halten den gesamten Datenbestand permanent im Hauptspeicher vor. Somit können lesende Zugriffe weitaus schneller erfolgen als bei traditionellen Datenbanksystemen, da keine I/O-Zugriffe auf die Festplatte erfolgen müssen. Für schreibende Zugriffe wurden Mechanismen entwickelt, die Persistenz und somit Transaktionssicherheit gewährleisten. In-Memory-Datenbanken werden seit geraumer Zeit entwickelt und haben sich in speziellen Anwendungen bewährt. Mit zunehmender Speicherdichte von DRAM-Bausteinen sind Hardwaresysteme wirtschaftlich erschwinglich, deren Hauptspeicher einen kompletten betrieblichen Datenbestand aufnehmen können. Somit stellt sich die Frage, ob In-Memory-Datenbanken auch in betrieblichen Anwendungssystemen eingesetzt werden können. Hasso Plattner, der mit HANA eine In-Memory-Datenbank entwickelt hat, ist ein Protagonist dieses Ansatzes. Er sieht erhebliche Potenziale für neue Konzepte in der Entwicklung betrieblicher Informationssysteme. So könne beispielsweise eine transaktionale und eine analytische Anwendung auf dem gleichen Datenbestand laufen, d. h. eine Trennung in operative Datenbanken einerseits und Data-Warehouse-Systeme andererseits ist in der betrieblichen Informationsverarbeitung nicht mehr notwendig (Plattner und Zeier 2011). Doch nicht alle Datenbank-Vertreter stimmen darin überein. Larry Ellison hat die Idee des betrieblichen In-Memory-Einsatzes, eher medienwirksam als seriös argumentativ, als „wacko“ bezeichnet (Bube 2010). Stonebraker (2011) sieht zwar eine Zukunft für In-Memory-Datenbanken in betrieblichen Anwendungen, hält aber weiterhin eine Trennung von OLTP- und OLAP-Anwendungen für sinnvoll. [Aus: Einleitung

    Sharing Data and Work Across Concurrent Analytical Queries

    Get PDF
    Today's data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional data warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for resources. Thus, modern DW depart from the query-centric model to execution models involving sharing of common data and work. Our goal is to show when and how a DW should employ sharing. We evaluate experimentally two sharing methodologies, based on their original prototype systems, that exploit work sharing opportunities among concurrent queries at run-time: Simultaneous Pipelining (SP), which shares intermediate results of common sub-plans, and Global Query Plans (GQP), which build and evaluate a single query plan with shared operators. First, after a short review of sharing methodologies, we show that SP and GQP are orthogonal techniques. SP can be applied to shared operators of a GQP, reducing response times by 20%-48% in workloads with numerous common sub-plans. Second, we corroborate previous results on the negative impact of SP on performance for cases of low concurrency. We attribute this behavior to a bottleneck caused by the push-based communication model of SP. We show that pull-based communication for SP eliminates the overhead of sharing altogether for low concurrency, and scales better on multi-core machines than push-based SP, further reducing response times by 82%-86% for high concurrency. Third, we perform an experimental analysis of SP, GQP and their combination, and show when each one is beneficial. We identify a trade-off between low and high concurrency. In the former case, traditional query-centric operators with SP perform better, while in the latter case, GQP with shared operators enhanced by SP give the best results

    Tre cronache veneziane inedite della Houghton Library di Harvard

    Get PDF
    The article describes the witnesses of three Venetian chronicles of the Houghton Library of the Harvard University in Cambridge (Massachusetts). The paper manuscript Ital. 67, dating to the 16th century, is acephalous and contains a history of Venice from 1106 to the 15th century. The story ends, in fact, by mentioning the noble captain Piero Loredan (1372 - 28 October 1438). The codex belonged to the Ward M. Canaday couple, who donated it to the Houghton library in 1964. The paper manuscript Ital. 178 dates to the XV century (the term post quem is 1417) and contains a history of Venice from the origins to the fifteenth century. It is mutilated in the final part. The codex belonged first to Walter Sneyd (1809-1888), then to Charles William Previt\ue9-Orton (1877-1947). It is not possible at the moment to indicate the exact date when the manuscript became part of the collection of the Houghton Library, where it is housed since 1996. The paper manuscript Riant 12 dates to the 17th century and contains a Chronicle of Venice from its foundation until 1432. The codex belonged to Count Paul Edouard Didier Riant (1836-1888) and entered the library of Harvard University in 1899