27,996 research outputs found

    Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution

    Full text link
    Cloud-based data analysis is nowadays common practice because of the lower system management overhead as well as the pay-as-you-go pricing model. The pricing model, however, is not always suitable for query processing as heavy use results in high costs. For example, in query-as-a-service systems, where users are charged per processed byte, collections of queries accessing the same data frequently can become expensive. The problem is compounded by the limited options for the user to optimize query execution when using declarative interfaces such as SQL. In this paper, we show how, without modifying existing systems and without the involvement of the cloud provider, it is possible to significantly reduce the overhead, and hence the cost, of query-as-a-service systems. Our approach is based on query rewriting so that multiple concurrent queries are combined into a single query. Our experiments show the aggregated amount of work done by the shared execution is smaller than in a query-at-a-time approach. Since queries are charged per byte processed, the cost of executing a group of queries is often the same as executing a single one of them. As an example, we demonstrate how the shared execution of the TPC-H benchmark is up to 100x and 16x cheaper in Amazon Athena and Google BigQuery than using a query-at-a-time approach while achieving a higher throughput

    How to Balance Privacy and Money through Pricing Mechanism in Personal Data Market

    Full text link
    A personal data market is a platform including three participants: data owners (individuals), data buyers and market maker. Data owners who provide personal data are compensated according to their privacy loss. Data buyers can submit a query and pay for the result according to their desired accuracy. Market maker coordinates between data owner and buyer. This framework has been previously studied based on differential privacy. However, the previous study assumes data owners can accept any level of privacy loss and data buyers can conduct the transaction without regard to the financial budget. In this paper, we propose a practical personal data trading framework that is able to strike a balance between money and privacy. In order to gain insights on user preferences, we first conducted an online survey on human attitude to- ward privacy and interest in personal data trading. Second, we identify the 5 key principles of personal data market, which is important for designing a reasonable trading frame- work and pricing mechanism. Third, we propose a reason- able trading framework for personal data which provides an overview of how the data is traded. Fourth, we propose a balanced pricing mechanism which computes the query price for data buyers and compensation for data owners (whose data are utilized) as a function of their privacy loss. The main goal is to ensure a fair trading for both parties. Finally, we will conduct an experiment to evaluate the output of our proposed pricing mechanism in comparison with other previously proposed mechanism

    PRICE DEMAND MODEL FOR A CLOUD CACHE

    Get PDF
    Cloud applications that offer data management services are emerging. Such clouds support caching of data in order to provide quality query services. The users can query the cloud data, paying the price for the infrastructure they use. Cloud management necessitates an economy that manages the service of multiple users in an efficient, but also, resource economic way that allows for cloud profit. Naturally, the maximization of cloud profit given some guarantees for user satisfaction presumes an appropriate price-demand model that enables optimal pricing of query services. The model should be plausible in that it reflects the correlation of cache structures involved in the queries. Optimal pricing is achieved based on a dynamic pricing scheme that adapts to time changes. This paper proposes a novel price-demand model designed for a cloud cache and a dynamic pricing scheme for queries executed in the cloud cache. The pricing solution employs a novel method that estimates the correlations of the cache services in an time-efficient manner. The experimental study shows the efficiency of the solution

    Efficient dictionary compression for processing RDF big data using Google BigQuery

    Get PDF
    The Resource Description Framework (RDF) data model, is used on the Web to express billions of structured statements in a wide range of topics, including government, publications, life sciences, etc. Consequently, processing and storing this data requires the provision of high specification systems, both in terms of storage and computational capabilities. On the other hand, cloud-based big data services such as Google BigQuery can be used to store and query this data without any upfront investment. Google BigQuery pricing is based on the size of the data being stored or queried, but given that RDF statements contain long Uniform Resource Identifiers (URIs), the cost of query and storage of RDF big data can increase rapidly. In this paper we present and evaluate a novel and efficient dictionary compression algorithm which is faster, generates small dictionaries that can fit in memory and results in better compression rate when compared with other large scale RDF dictionary compression. Consequently, our algorithm also reduces the BigQuery storage and query cos

    The Design of Arbitrage-Free Data Pricing Schemes

    Get PDF
    Motivated by a growing market that involves buying and selling data over the web, we study pricing schemes that assign value to queries issued over a database. Previous work studied pricing mechanisms that compute the price of a query by extending a data seller's explicit prices on certain queries, or investigated the properties that a pricing function should exhibit without detailing a generic construction. In this work, we present a formal framework for pricing queries over data that allows the construction of general families of pricing functions, with the main goal of avoiding arbitrage. We consider two types of pricing schemes: instance-independent schemes, where the price depends only on the structure of the query, and answer-dependent schemes, where the price also depends on the query output. Our main result is a complete characterization of the structure of pricing functions in both settings, by relating it to properties of a function over a lattice. We use our characterization, together with information-theoretic methods, to construct a variety of arbitrage-free pricing functions. Finally, we discuss various tradeoffs in the design space and present techniques for efficient computation of the proposed pricing functions.Comment: full pape

    A Theory of Pricing Private Data

    Full text link
    Personal data has value to both its owner and to institutions who would like to analyze it. Privacy mechanisms protect the owner's data while releasing to analysts noisy versions of aggregate query results. But such strict protections of individual's data have not yet found wide use in practice. Instead, Internet companies, for example, commonly provide free services in return for valuable sensitive information from users, which they exploit and sometimes sell to third parties. As the awareness of the value of the personal data increases, so has the drive to compensate the end user for her private information. The idea of monetizing private data can improve over the narrower view of hiding private data, since it empowers individuals to control their data through financial means. In this paper we propose a theoretical framework for assigning prices to noisy query answers, as a function of their accuracy, and for dividing the price amongst data owners who deserve compensation for their loss of privacy. Our framework adopts and extends key principles from both differential privacy and query pricing in data markets. We identify essential properties of the price function and micro-payments, and characterize valid solutions.Comment: 25 pages, 2 figures. Best Paper Award, to appear in the 16th International Conference on Database Theory (ICDT), 201
    corecore