2,022 research outputs found

    Efficient integrity checks for join queries in the cloud

    Get PDF
    Cloud computing is receiving massive interest from users and companies for its convenient support of scalable access to data and services. The variety and diversification of offers by cloud providers allow users to selectively adopt storage and computational services as they best suit their needs, including cost saving considerations. In such an open context, security remains a major concern, as confidentiality and integrity of data and queries over them can be at risk. In this paper, we present efficient techniques to verify the integrity of join queries computed by potentially untrusted cloud providers, while also protecting data and computation confidentiality. Our techniques support joins among multiple data sources and introduce a limited overhead in query computation, enabling also economical savings, as the ability to assess integrity increases the spectrum of offers that can be considered for performing the computation. Formal analysis and experimental evaluations confirm the effectiveness and efficiency of our solutions

    SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns

    Full text link
    The emergence of database-as-a-service platforms has made deploying database applications easier than before. Now, developers can quickly create scalable applications. However, designing performant, maintainable, and accurate applications is challenging. Developers may unknowingly introduce anti-patterns in the application's SQL statements. These anti-patterns are design decisions that are intended to solve a problem, but often lead to other problems by violating fundamental design principles. In this paper, we present SQLCheck, a holistic toolchain for automatically finding and fixing anti-patterns in database applications. We introduce techniques for automatically (1) detecting anti-patterns with high precision and recall, (2) ranking the anti-patterns based on their impact on performance, maintainability, and accuracy of applications, and (3) suggesting alternative queries and changes to the database design to fix these anti-patterns. We demonstrate the prevalence of these anti-patterns in a large collection of queries and databases collected from open-source repositories. We introduce an anti-pattern detection algorithm that augments query analysis with data analysis. We present a ranking model for characterizing the impact of frequently occurring anti-patterns. We discuss how SQLCheck suggests fixes for high-impact anti-patterns using rule-based query refactoring techniques. Our experiments demonstrate that SQLCheck enables developers to create more performant, maintainable, and accurate applications.Comment: 18 pages (14 page paper, 1 page references, 2 page Appendix), 12 figures, Conference: SIGMOD'2

    The Data Lakehouse: Data Warehousing and More

    Full text link
    Relational Database Management Systems designed for Online Analytical Processing (RDBMS-OLAP) have been foundational to democratizing data and enabling analytical use cases such as business intelligence and reporting for many years. However, RDBMS-OLAP systems present some well-known challenges. They are primarily optimized only for relational workloads, lead to proliferation of data copies which can become unmanageable, and since the data is stored in proprietary formats, it can lead to vendor lock-in, restricting access to engines, tools, and capabilities beyond what the vendor offers. As the demand for data-driven decision making surges, the need for a more robust data architecture to address these challenges becomes ever more critical. Cloud data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but they present their own set of challenges. More recently, organizations have often followed a two-tier architectural approach to take advantage of both these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems. However, this approach brings additional challenges, complexities, and overhead. This paper discusses how a data lakehouse, a new architectural approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake combined, while also providing additional advantages. We take today's data warehousing and break it down into implementation independent components, capabilities, and practices. We then take these aspects and show how a lakehouse architecture satisfies them. Then, we go a step further and discuss what additional capabilities and benefits a lakehouse architecture provides over an RDBMS-OLAP

    Distributed Relational Database Performance in Cloud Computing: an Investigative Study

    Get PDF
    Weak points in major relational database systems in a Cloud Computing environment, in which the nodes were geographically distant, are identified. The study questions whether running databases in the Cloud provides operational disadvantages. Findings indicate that performance measures of RDBMS’ in a Cloud Computing environment are inconsistent and that a contributing factor to poor performance is the public or shared infrastructure on the Internet. Also, that RDBMS’ in a Cloud Computing environment become network-bound in addition to being I/O bound. The study concludes that Cloud Computing creates an environment that negatively impacts RDBMS performance for RDBMS’ that were designed for n-tier architecture
    • …
    corecore