2,022 research outputs found
Efficient integrity checks for join queries in the cloud
Cloud computing is receiving massive interest from users and companies for its convenient support of scalable access to data and services. The variety and diversification of offers by cloud providers allow users to selectively adopt storage and computational services as they best suit their needs, including cost saving considerations. In such an open context, security remains a major concern, as confidentiality and integrity of data and queries over them can be at risk. In this paper, we present efficient techniques to verify the integrity of join queries computed by potentially untrusted cloud providers, while also protecting data and computation confidentiality. Our techniques support joins among multiple data sources and introduce a limited overhead in query computation, enabling also economical savings, as the ability to assess integrity increases the spectrum of offers that can be considered for performing the computation. Formal analysis and experimental evaluations confirm the effectiveness and efficiency of our solutions
SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns
The emergence of database-as-a-service platforms has made deploying database
applications easier than before. Now, developers can quickly create scalable
applications. However, designing performant, maintainable, and accurate
applications is challenging. Developers may unknowingly introduce anti-patterns
in the application's SQL statements. These anti-patterns are design decisions
that are intended to solve a problem, but often lead to other problems by
violating fundamental design principles.
In this paper, we present SQLCheck, a holistic toolchain for automatically
finding and fixing anti-patterns in database applications. We introduce
techniques for automatically (1) detecting anti-patterns with high precision
and recall, (2) ranking the anti-patterns based on their impact on performance,
maintainability, and accuracy of applications, and (3) suggesting alternative
queries and changes to the database design to fix these anti-patterns. We
demonstrate the prevalence of these anti-patterns in a large collection of
queries and databases collected from open-source repositories. We introduce an
anti-pattern detection algorithm that augments query analysis with data
analysis. We present a ranking model for characterizing the impact of
frequently occurring anti-patterns. We discuss how SQLCheck suggests fixes for
high-impact anti-patterns using rule-based query refactoring techniques. Our
experiments demonstrate that SQLCheck enables developers to create more
performant, maintainable, and accurate applications.Comment: 18 pages (14 page paper, 1 page references, 2 page Appendix), 12
figures, Conference: SIGMOD'2
The Data Lakehouse: Data Warehousing and More
Relational Database Management Systems designed for Online Analytical
Processing (RDBMS-OLAP) have been foundational to democratizing data and
enabling analytical use cases such as business intelligence and reporting for
many years. However, RDBMS-OLAP systems present some well-known challenges.
They are primarily optimized only for relational workloads, lead to
proliferation of data copies which can become unmanageable, and since the data
is stored in proprietary formats, it can lead to vendor lock-in, restricting
access to engines, tools, and capabilities beyond what the vendor offers. As
the demand for data-driven decision making surges, the need for a more robust
data architecture to address these challenges becomes ever more critical. Cloud
data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but
they present their own set of challenges. More recently, organizations have
often followed a two-tier architectural approach to take advantage of both
these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems.
However, this approach brings additional challenges, complexities, and
overhead. This paper discusses how a data lakehouse, a new architectural
approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake
combined, while also providing additional advantages. We take today's data
warehousing and break it down into implementation independent components,
capabilities, and practices. We then take these aspects and show how a
lakehouse architecture satisfies them. Then, we go a step further and discuss
what additional capabilities and benefits a lakehouse architecture provides
over an RDBMS-OLAP
Distributed Relational Database Performance in Cloud Computing: an Investigative Study
Weak points in major relational database systems in a Cloud Computing environment, in which the nodes were geographically distant, are identified. The study questions whether running databases in the Cloud provides operational disadvantages. Findings indicate that performance measures of RDBMS’ in a Cloud Computing environment are inconsistent and that a contributing factor to poor performance is the public or shared infrastructure on the Internet. Also, that RDBMS’ in a Cloud Computing environment become network-bound in addition to being I/O bound. The study concludes that Cloud Computing creates an environment that negatively impacts RDBMS performance for RDBMS’ that were designed for n-tier architecture
- …