10,522 research outputs found
Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution
Cloud-based data analysis is nowadays common practice because of the lower
system management overhead as well as the pay-as-you-go pricing model. The
pricing model, however, is not always suitable for query processing as heavy
use results in high costs. For example, in query-as-a-service systems, where
users are charged per processed byte, collections of queries accessing the same
data frequently can become expensive. The problem is compounded by the limited
options for the user to optimize query execution when using declarative
interfaces such as SQL. In this paper, we show how, without modifying existing
systems and without the involvement of the cloud provider, it is possible to
significantly reduce the overhead, and hence the cost, of query-as-a-service
systems. Our approach is based on query rewriting so that multiple concurrent
queries are combined into a single query. Our experiments show the aggregated
amount of work done by the shared execution is smaller than in a
query-at-a-time approach. Since queries are charged per byte processed, the
cost of executing a group of queries is often the same as executing a single
one of them. As an example, we demonstrate how the shared execution of the
TPC-H benchmark is up to 100x and 16x cheaper in Amazon Athena and Google
BigQuery than using a query-at-a-time approach while achieving a higher
throughput
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
SoK: Cryptographically Protected Database Search
Protected database search systems cryptographically isolate the roles of
reading from, writing to, and administering the database. This separation
limits unnecessary administrator access and protects data in the case of system
breaches. Since protected search was introduced in 2000, the area has grown
rapidly; systems are offered by academia, start-ups, and established companies.
However, there is no best protected search system or set of techniques.
Design of such systems is a balancing act between security, functionality,
performance, and usability. This challenge is made more difficult by ongoing
database specialization, as some users will want the functionality of SQL,
NoSQL, or NewSQL databases. This database evolution will continue, and the
protected search community should be able to quickly provide functionality
consistent with newly invented databases.
At the same time, the community must accurately and clearly characterize the
tradeoffs between different approaches. To address these challenges, we provide
the following contributions:
1) An identification of the important primitive operations across database
paradigms. We find there are a small number of base operations that can be used
and combined to support a large number of database paradigms.
2) An evaluation of the current state of protected search systems in
implementing these base operations. This evaluation describes the main
approaches and tradeoffs for each base operation. Furthermore, it puts
protected search in the context of unprotected search, identifying key gaps in
functionality.
3) An analysis of attacks against protected search for different base
queries.
4) A roadmap and tools for transforming a protected search system into a
protected database, including an open-source performance evaluation platform
and initial user opinions of protected search.Comment: 20 pages, to appear to IEEE Security and Privac
- …